Search This Blog

Scraping and Storing Data in Databases

๐Ÿ—ƒ️ Scraping and Storing Data in Databases with Python

Web scraping is awesome for collecting data — but what happens when you need to store, filter, or analyze that data at scale?

Instead of dumping it into a CSV file every time, why not store it in a database?

In this post, you’ll learn:

✅ Why and when to use a database
✅ How to scrape data with Python
✅ How to store data in SQLite or PostgreSQL
✅ Bonus: Query your scraped data like a pro


๐Ÿ’ก Why Use a Database?

Storing scraped data in a database lets you:

  • Query and filter records easily

  • Avoid duplicates

  • Link related data across multiple tables

  • Scale to millions of rows

Perfect for:

  • News aggregation

  • Job boards

  • Product price trackers

  • Custom dashboards


๐Ÿงฐ Tools You’ll Need

  • requests + BeautifulSoup (for scraping)

  • sqlite3 or SQLAlchemy (for database handling)

  • pandas (optional, for data inspection)

Install with:

pip install beautifulsoup4 requests sqlalchemy pandas

๐Ÿงช Example: Quotes Scraper with SQLite

We'll use http://quotes.toscrape.com — a great playground for scraping.


๐Ÿ“ฅ Step 1: Scrape the Data

import requests
from bs4 import BeautifulSoup

url = "http://quotes.toscrape.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

quotes = []
for quote in soup.find_all("div", class_="quote"):
    text = quote.find("span", class_="text").text
    author = quote.find("small", class_="author").text
    quotes.append((text, author))

๐Ÿ—„️ Step 2: Store in SQLite Database

SQLite is built into Python — no setup needed!

import sqlite3

conn = sqlite3.connect("quotes.db")
cursor = conn.cursor()

# Create table
cursor.execute('''
    CREATE TABLE IF NOT EXISTS quotes (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        text TEXT,
        author TEXT
    )
''')

# Insert data
cursor.executemany("INSERT INTO quotes (text, author) VALUES (?, ?)", quotes)
conn.commit()
conn.close()

๐Ÿ”Ž Step 3: Query the Data

conn = sqlite3.connect("quotes.db")
cursor = conn.cursor()

cursor.execute("SELECT * FROM quotes WHERE author = 'Albert Einstein'")
rows = cursor.fetchall()

for row in rows:
    print(row)

conn.close()

⚡ Bonus: Using SQLAlchemy for Bigger Projects

If you're working with PostgreSQL, MySQL, or want more flexibility, use SQLAlchemy:

from sqlalchemy import create_engine
import pandas as pd

engine = create_engine("sqlite:///quotes.db")

df = pd.DataFrame(quotes, columns=["text", "author"])
df.to_sql("quotes", engine, if_exists="append", index=False)

๐Ÿ“ฆ Switching to PostgreSQL?

Install psycopg2:

pip install psycopg2-binary

Then change the engine:

engine = create_engine("postgresql://username:password@localhost/dbname")

✅ Recap

Task Tool
Scraping HTML requests, BeautifulSoup
Storing data sqlite3, SQLAlchemy
Querying data SQL
Scaling up Switch to PostgreSQL or MySQL

๐Ÿง  Final Thoughts

Scraping + databases = ๐Ÿ”ฅ

It unlocks:

  • Long-term storage

  • Rich analysis

  • Scheduled scraping pipelines

Perfect for real-world projects and portfolio work.

Popular Posts