Search This Blog

Automating Web Scraping with Selenium

๐Ÿค– Automating Web Scraping with Selenium in Python

Ever tried scraping a website and got… nothing? That’s probably because the data is loaded with JavaScript, and standard tools like requests + BeautifulSoup can’t see it. Enter Selenium — a browser automation tool that lets you scrape dynamic websites just like a human user would.

In this blog, you’ll learn:

✅ What Selenium is and when to use it
✅ How to install and set up Selenium
✅ How to scrape dynamic content
✅ Real-world examples with wait times and button clicks


๐Ÿง  What is Selenium?

Selenium is a Python tool that automates browsers. It lets you:

  • Open and interact with web pages

  • Click buttons, scroll, fill out forms

  • Wait for dynamic content to load

  • Extract visible content after JavaScript renders it


๐Ÿงฐ What You’ll Need

  • Python 3.x

  • selenium library

  • A WebDriver (e.g. ChromeDriver, Firefox GeckoDriver)

Install Selenium

pip install selenium

Download ChromeDriver

Go to: https://chromedriver.chromium.org/downloads
Make sure the driver version matches your Chrome browser version.
Add the driver to your system PATH or place it in your script folder.


๐Ÿš€ Getting Started with Selenium

Step 1: Launch a Browser

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By

service = Service("chromedriver")  # or give full path
driver = webdriver.Chrome(service=service)

driver.get("http://quotes.toscrape.com/js")  # JavaScript version of the site

๐Ÿ” Step 2: Extract Elements

Unlike BeautifulSoup, Selenium uses .find_element() and .find_elements():

quotes = driver.find_elements(By.CLASS_NAME, "text")
authors = driver.find_elements(By.CLASS_NAME, "author")

for quote, author in zip(quotes, authors):
    print(f"{quote.text} — {author.text}")

⏳ Step 3: Wait for Content to Load

Dynamic pages need time to load content. Use WebDriverWait:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CLASS_NAME, "quote"))
)

๐Ÿงญ Clicking Buttons, Navigating Pages

Selenium is great for paginated sites.

while True:
    quotes = driver.find_elements(By.CLASS_NAME, "text")
    for q in quotes:
        print(q.text)

    try:
        next_btn = driver.find_element(By.LINK_TEXT, "Next")
        next_btn.click()
        WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.CLASS_NAME, "quote"))
        )
    except:
        break

๐Ÿงน Clean Up

driver.quit()

Always close the browser when you’re done to avoid memory issues.


๐Ÿ“ฆ Saving Data to CSV

import pandas as pd

data = []

quotes = driver.find_elements(By.CLASS_NAME, "quote")
for q in quotes:
    text = q.find_element(By.CLASS_NAME, "text").text
    author = q.find_element(By.CLASS_NAME, "author").text
    data.append({"Quote": text, "Author": author})

df = pd.DataFrame(data)
df.to_csv("quotes_selenium.csv", index=False)

✅ Summary

Task Code
Load page driver.get(url)
Find element driver.find_element(By.CLASS_NAME, ...)
Wait for JS WebDriverWait(...).until(...)
Click element.click()

Selenium lets you interact with websites as if you were a user, unlocking pages that traditional scrapers can’t reach.


⚠️ Pro Tips

  • Use headless mode for background automation:

    from selenium.webdriver.chrome.options import Options
    options = Options()
    options.add_argument("--headless")
    driver = webdriver.Chrome(service=service, options=options)
    
  • Add random delays with time.sleep() to avoid detection.

  • Don’t scrape sensitive or copyrighted data.

  • Always check the site’s robots.txt.


๐Ÿ”„ Bonus: Scraping Infinite Scroll?

You can use:

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

Then wait for content to load and repeat.


๐Ÿ’ก Use Cases

  • Scraping product prices from JS-heavy e-commerce sites

  • Scraping content behind login forms

  • Automating form submissions or report downloads


Popular Posts