Automating PDF Generation and Manipulation with Python
PDF files are widely used for reports, invoices, and documents. Python provides libraries like reportlab
, PyPDF2
, and pdfplumber
to generate, edit, and extract text from PDFs.
Installing Required Libraries
pip install reportlab PyPDF2 pdfplumber
reportlab
– Creates PDFs from scratch with text, images, tables, and charts.PyPDF2
– Merges, splits, and extracts text from PDFs.pdfplumber
– Extracts structured data from PDFs.
Creating a PDF Using reportlab
from reportlab.pdfgen import canvas
# Create a new PDF
pdf = canvas.Canvas("output.pdf")
# Add text
pdf.setFont("Helvetica", 14)
pdf.drawString(100, 750, "Automated PDF Report")
pdf.drawString(100, 730, "Generated using Python")
# Save the PDF
pdf.save()
print("PDF created successfully.")
Adding Images to a PDF
from reportlab.lib.pagesizes import letter
pdf = canvas.Canvas("pdf_with_image.pdf", pagesize=letter)
pdf.drawString(100, 750, "PDF with Image")
pdf.drawImage("image.png", 100, 500, width=200, height=100)
pdf.save()
Creating Tables in a PDF
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle
from reportlab.lib import colors
# Create PDF document
pdf = SimpleDocTemplate("table_report.pdf")
# Table data
data = [["Product", "Sales", "Revenue"],
["Laptop", "100", "$50,000"],
["Phone", "200", "$30,000"]]
# Create table
table = Table(data)
# Add styles
style = TableStyle([
("BACKGROUND", (0, 0), (-1, 0), colors.grey),
("TEXTCOLOR", (0, 0), (-1, 0), colors.whitesmoke),
("ALIGN", (0, 0), (-1, -1), "CENTER"),
("FONTNAME", (0, 0), (-1, 0), "Helvetica-Bold"),
("BOTTOMPADDING", (0, 0), (-1, 0), 10),
("GRID", (0, 0), (-1, -1), 1, colors.black),
])
table.setStyle(style)
# Build PDF
pdf.build([table])
print("PDF with table created.")
Merging Multiple PDFs Using PyPDF2
from PyPDF2 import PdfMerger
pdfs = ["file1.pdf", "file2.pdf"]
merger = PdfMerger()
for pdf in pdfs:
merger.append(pdf)
merger.write("merged.pdf")
merger.close()
print("PDFs merged successfully.")
Splitting a PDF File
from PyPDF2 import PdfReader, PdfWriter
reader = PdfReader("large.pdf")
writer = PdfWriter()
# Extract first 3 pages
for i in range(3):
writer.add_page(reader.pages[i])
# Save the extracted pages
with open("split.pdf", "wb") as output_pdf:
writer.write(output_pdf)
print("PDF split successfully.")
Extracting Text from a PDF Using PyPDF2
reader = PdfReader("document.pdf")
# Extract text from the first page
page = reader.pages[0]
text = page.extract_text()
print("Extracted text:", text)
Extracting Tables from a PDF Using pdfplumber
import pdfplumber
with pdfplumber.open("table_document.pdf") as pdf:
page = pdf.pages[0]
table = page.extract_table()
for row in table:
print(row)
Automating PDF Report Generation and Emailing
import yagmail
yag = yagmail.SMTP("your_email@gmail.com", "your_password")
# Send the PDF report via email
yag.send(
to="recipient@example.com",
subject="Automated PDF Report",
contents="Please find the attached PDF report.",
attachments="output.pdf"
)
print("PDF report emailed successfully.")
Conclusion
This section covered automating PDF generation and manipulation, including creating PDFs, adding images and tables, merging and splitting PDFs, extracting text, and emailing PDF reports. These techniques are useful for automating document workflows.
Would you like additional examples or modifications?