Search This Blog

Automating PDF Generation and Manipulation with Python

 

Automating PDF Generation and Manipulation with Python

PDF files are widely used for reports, invoices, and documents. Python provides libraries like reportlab, PyPDF2, and pdfplumber to generate, edit, and extract text from PDFs.


Installing Required Libraries

pip install reportlab PyPDF2 pdfplumber
  • reportlab – Creates PDFs from scratch with text, images, tables, and charts.
  • PyPDF2 – Merges, splits, and extracts text from PDFs.
  • pdfplumber – Extracts structured data from PDFs.

Creating a PDF Using reportlab

from reportlab.pdfgen import canvas

# Create a new PDF
pdf = canvas.Canvas("output.pdf")

# Add text
pdf.setFont("Helvetica", 14)
pdf.drawString(100, 750, "Automated PDF Report")
pdf.drawString(100, 730, "Generated using Python")

# Save the PDF
pdf.save()

print("PDF created successfully.")

Adding Images to a PDF

from reportlab.lib.pagesizes import letter

pdf = canvas.Canvas("pdf_with_image.pdf", pagesize=letter)

pdf.drawString(100, 750, "PDF with Image")
pdf.drawImage("image.png", 100, 500, width=200, height=100)

pdf.save()

Creating Tables in a PDF

from reportlab.platypus import SimpleDocTemplate, Table, TableStyle
from reportlab.lib import colors

# Create PDF document
pdf = SimpleDocTemplate("table_report.pdf")

# Table data
data = [["Product", "Sales", "Revenue"],
        ["Laptop", "100", "$50,000"],
        ["Phone", "200", "$30,000"]]

# Create table
table = Table(data)

# Add styles
style = TableStyle([
    ("BACKGROUND", (0, 0), (-1, 0), colors.grey),
    ("TEXTCOLOR", (0, 0), (-1, 0), colors.whitesmoke),
    ("ALIGN", (0, 0), (-1, -1), "CENTER"),
    ("FONTNAME", (0, 0), (-1, 0), "Helvetica-Bold"),
    ("BOTTOMPADDING", (0, 0), (-1, 0), 10),
    ("GRID", (0, 0), (-1, -1), 1, colors.black),
])

table.setStyle(style)

# Build PDF
pdf.build([table])

print("PDF with table created.")

Merging Multiple PDFs Using PyPDF2

from PyPDF2 import PdfMerger

pdfs = ["file1.pdf", "file2.pdf"]
merger = PdfMerger()

for pdf in pdfs:
    merger.append(pdf)

merger.write("merged.pdf")
merger.close()

print("PDFs merged successfully.")

Splitting a PDF File

from PyPDF2 import PdfReader, PdfWriter

reader = PdfReader("large.pdf")
writer = PdfWriter()

# Extract first 3 pages
for i in range(3):
    writer.add_page(reader.pages[i])

# Save the extracted pages
with open("split.pdf", "wb") as output_pdf:
    writer.write(output_pdf)

print("PDF split successfully.")

Extracting Text from a PDF Using PyPDF2

reader = PdfReader("document.pdf")

# Extract text from the first page
page = reader.pages[0]
text = page.extract_text()

print("Extracted text:", text)

Extracting Tables from a PDF Using pdfplumber

import pdfplumber

with pdfplumber.open("table_document.pdf") as pdf:
    page = pdf.pages[0]
    table = page.extract_table()

    for row in table:
        print(row)

Automating PDF Report Generation and Emailing

import yagmail

yag = yagmail.SMTP("your_email@gmail.com", "your_password")

# Send the PDF report via email
yag.send(
    to="recipient@example.com",
    subject="Automated PDF Report",
    contents="Please find the attached PDF report.",
    attachments="output.pdf"
)

print("PDF report emailed successfully.")

Conclusion

This section covered automating PDF generation and manipulation, including creating PDFs, adding images and tables, merging and splitting PDFs, extracting text, and emailing PDF reports. These techniques are useful for automating document workflows.

Would you like additional examples or modifications?

Popular Posts