๐️ Scraping and Storing Data in Databases with Python
Web scraping is awesome for collecting data — but what happens when you need to store, filter, or analyze that data at scale?
Instead of dumping it into a CSV file every time, why not store it in a database?
In this post, you’ll learn:
✅ Why and when to use a database
✅ How to scrape data with Python
✅ How to store data in SQLite or PostgreSQL
✅ Bonus: Query your scraped data like a pro
๐ก Why Use a Database?
Storing scraped data in a database lets you:
-
Query and filter records easily
-
Avoid duplicates
-
Link related data across multiple tables
-
Scale to millions of rows
Perfect for:
-
News aggregation
-
Job boards
-
Product price trackers
-
Custom dashboards
๐งฐ Tools You’ll Need
-
requests
+BeautifulSoup
(for scraping) -
sqlite3
orSQLAlchemy
(for database handling) -
pandas
(optional, for data inspection)
Install with:
pip install beautifulsoup4 requests sqlalchemy pandas
๐งช Example: Quotes Scraper with SQLite
We'll use http://quotes.toscrape.com — a great playground for scraping.
๐ฅ Step 1: Scrape the Data
import requests
from bs4 import BeautifulSoup
url = "http://quotes.toscrape.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
quotes = []
for quote in soup.find_all("div", class_="quote"):
text = quote.find("span", class_="text").text
author = quote.find("small", class_="author").text
quotes.append((text, author))
๐️ Step 2: Store in SQLite Database
SQLite is built into Python — no setup needed!
import sqlite3
conn = sqlite3.connect("quotes.db")
cursor = conn.cursor()
# Create table
cursor.execute('''
CREATE TABLE IF NOT EXISTS quotes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
text TEXT,
author TEXT
)
''')
# Insert data
cursor.executemany("INSERT INTO quotes (text, author) VALUES (?, ?)", quotes)
conn.commit()
conn.close()
๐ Step 3: Query the Data
conn = sqlite3.connect("quotes.db")
cursor = conn.cursor()
cursor.execute("SELECT * FROM quotes WHERE author = 'Albert Einstein'")
rows = cursor.fetchall()
for row in rows:
print(row)
conn.close()
⚡ Bonus: Using SQLAlchemy for Bigger Projects
If you're working with PostgreSQL, MySQL, or want more flexibility, use SQLAlchemy:
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine("sqlite:///quotes.db")
df = pd.DataFrame(quotes, columns=["text", "author"])
df.to_sql("quotes", engine, if_exists="append", index=False)
๐ฆ Switching to PostgreSQL?
Install psycopg2:
pip install psycopg2-binary
Then change the engine:
engine = create_engine("postgresql://username:password@localhost/dbname")
✅ Recap
Task | Tool |
---|---|
Scraping HTML | requests , BeautifulSoup |
Storing data | sqlite3 , SQLAlchemy |
Querying data | SQL |
Scaling up | Switch to PostgreSQL or MySQL |
๐ง Final Thoughts
Scraping + databases = ๐ฅ
It unlocks:
-
Long-term storage
-
Rich analysis
-
Scheduled scraping pipelines
Perfect for real-world projects and portfolio work.