🗃️ Scraping and Storing Data in Databases with Python

Web scraping is awesome for collecting data — but what happens when you need to store, filter, or analyze that data at scale?

Instead of dumping it into a CSV file every time, why not store it in a database?

In this post, you’ll learn:

✅ Why and when to use a database
✅ How to scrape data with Python
✅ How to store data in SQLite or PostgreSQL
✅ Bonus: Query your scraped data like a pro

💡 Why Use a Database?

Storing scraped data in a database lets you:

Query and filter records easily
Avoid duplicates
Link related data across multiple tables
Scale to millions of rows

Perfect for:

News aggregation
Job boards
Product price trackers
Custom dashboards

🧰 Tools You’ll Need

requests + BeautifulSoup (for scraping)
sqlite3 or SQLAlchemy (for database handling)
pandas (optional, for data inspection)

Install with:

pip install beautifulsoup4 requests sqlalchemy pandas

🧪 Example: Quotes Scraper with SQLite

We'll use http://quotes.toscrape.com — a great playground for scraping.

📥 Step 1: Scrape the Data

import requests
from bs4 import BeautifulSoup

url = "http://quotes.toscrape.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

quotes = []
for quote in soup.find_all("div", class_="quote"):
    text = quote.find("span", class_="text").text
    author = quote.find("small", class_="author").text
    quotes.append((text, author))

🗄️ Step 2: Store in SQLite Database

SQLite is built into Python — no setup needed!

import sqlite3

conn = sqlite3.connect("quotes.db")
cursor = conn.cursor()

# Create table
cursor.execute('''
    CREATE TABLE IF NOT EXISTS quotes (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        text TEXT,
        author TEXT
    )
''')

# Insert data
cursor.executemany("INSERT INTO quotes (text, author) VALUES (?, ?)", quotes)
conn.commit()
conn.close()

🔎 Step 3: Query the Data

conn = sqlite3.connect("quotes.db")
cursor = conn.cursor()

cursor.execute("SELECT * FROM quotes WHERE author = 'Albert Einstein'")
rows = cursor.fetchall()

for row in rows:
    print(row)

conn.close()

⚡ Bonus: Using SQLAlchemy for Bigger Projects

If you're working with PostgreSQL, MySQL, or want more flexibility, use SQLAlchemy:

from sqlalchemy import create_engine
import pandas as pd

engine = create_engine("sqlite:///quotes.db")

df = pd.DataFrame(quotes, columns=["text", "author"])
df.to_sql("quotes", engine, if_exists="append", index=False)

📦 Switching to PostgreSQL?

Install psycopg2:

pip install psycopg2-binary

Then change the engine:

engine = create_engine("postgresql://username:password@localhost/dbname")

✅ Recap

Task	Tool
Scraping HTML	`requests`, `BeautifulSoup`
Storing data	`sqlite3`, `SQLAlchemy`
Querying data	`SQL`
Scaling up	Switch to PostgreSQL or MySQL

🧠 Final Thoughts

Scraping + databases = 🔥

It unlocks:

Long-term storage
Rich analysis
Scheduled scraping pipelines

Perfect for real-world projects and portfolio work.

deltagradient