๐ค Automating Web Scraping with Selenium in Python
Ever tried scraping a website and got… nothing? That’s probably because the data is loaded with JavaScript, and standard tools like requests
+ BeautifulSoup
can’t see it. Enter Selenium — a browser automation tool that lets you scrape dynamic websites just like a human user would.
In this blog, you’ll learn:
✅ What Selenium is and when to use it
✅ How to install and set up Selenium
✅ How to scrape dynamic content
✅ Real-world examples with wait times and button clicks
๐ง What is Selenium?
Selenium is a Python tool that automates browsers. It lets you:
-
Open and interact with web pages
-
Click buttons, scroll, fill out forms
-
Wait for dynamic content to load
-
Extract visible content after JavaScript renders it
๐งฐ What You’ll Need
-
Python 3.x
-
selenium
library -
A WebDriver (e.g. ChromeDriver, Firefox GeckoDriver)
Install Selenium
pip install selenium
Download ChromeDriver
Go to: https://chromedriver.chromium.org/downloads
Make sure the driver version matches your Chrome browser version.
Add the driver to your system PATH or place it in your script folder.
๐ Getting Started with Selenium
Step 1: Launch a Browser
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
service = Service("chromedriver") # or give full path
driver = webdriver.Chrome(service=service)
driver.get("http://quotes.toscrape.com/js") # JavaScript version of the site
๐ Step 2: Extract Elements
Unlike BeautifulSoup, Selenium uses .find_element()
and .find_elements()
:
quotes = driver.find_elements(By.CLASS_NAME, "text")
authors = driver.find_elements(By.CLASS_NAME, "author")
for quote, author in zip(quotes, authors):
print(f"{quote.text} — {author.text}")
⏳ Step 3: Wait for Content to Load
Dynamic pages need time to load content. Use WebDriverWait
:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "quote"))
)
๐งญ Clicking Buttons, Navigating Pages
Selenium is great for paginated sites.
while True:
quotes = driver.find_elements(By.CLASS_NAME, "text")
for q in quotes:
print(q.text)
try:
next_btn = driver.find_element(By.LINK_TEXT, "Next")
next_btn.click()
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "quote"))
)
except:
break
๐งน Clean Up
driver.quit()
Always close the browser when you’re done to avoid memory issues.
๐ฆ Saving Data to CSV
import pandas as pd
data = []
quotes = driver.find_elements(By.CLASS_NAME, "quote")
for q in quotes:
text = q.find_element(By.CLASS_NAME, "text").text
author = q.find_element(By.CLASS_NAME, "author").text
data.append({"Quote": text, "Author": author})
df = pd.DataFrame(data)
df.to_csv("quotes_selenium.csv", index=False)
✅ Summary
Task | Code |
---|---|
Load page | driver.get(url) |
Find element | driver.find_element(By.CLASS_NAME, ...) |
Wait for JS | WebDriverWait(...).until(...) |
Click | element.click() |
Selenium lets you interact with websites as if you were a user, unlocking pages that traditional scrapers can’t reach.
⚠️ Pro Tips
-
Use headless mode for background automation:
from selenium.webdriver.chrome.options import Options options = Options() options.add_argument("--headless") driver = webdriver.Chrome(service=service, options=options)
-
Add random delays with
time.sleep()
to avoid detection. -
Don’t scrape sensitive or copyrighted data.
-
Always check the site’s
robots.txt
.
๐ Bonus: Scraping Infinite Scroll?
You can use:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
Then wait for content to load and repeat.
๐ก Use Cases
-
Scraping product prices from JS-heavy e-commerce sites
-
Scraping content behind login forms
-
Automating form submissions or report downloads