🔏 Watermarking and Merging PDFs with PyPDF2
Whether you're managing large documents, creating reports, or distributing confidential information, adding watermarks and merging PDFs can be incredibly useful. Fortunately, the PyPDF2 library in Python allows you to easily manipulate PDFs, including adding watermarks and merging multiple PDFs into a single document.
In this blog post, we’ll cover:
✅ How to watermark PDFs using PyPDF2
✅ How to merge multiple PDFs into one
✅ Practical examples for both watermarking and merging PDFs
🧰 What You'll Need
-
Python 3.x
-
PyPDF2 library installed (which you can do via
pip install PyPDF2
) -
PDFs to work with (one for watermarking and others for merging)
📦 Installing PyPDF2
To get started, you need to install the PyPDF2 library if you haven’t already:
pip install PyPDF2
🔏 Watermarking PDFs with PyPDF2
A watermark is often used to protect a document from unauthorized use. It can be a logo, text, or any other image applied over the content. PyPDF2 allows you to apply a watermark to every page of a PDF.
1. Creating the Watermark
You first need a PDF file that contains your watermark (it could be a text watermark or an image). The watermark file will be applied on top of the original document.
2. Watermarking a PDF with PyPDF2
Here’s a simple Python script that applies a watermark to every page of an existing PDF:
import PyPDF2
# Open the original PDF and the watermark PDF
with open("original.pdf", "rb") as original_file, open("watermark.pdf", "rb") as watermark_file:
# Create PdfReader objects for both the original and watermark PDFs
original_pdf = PyPDF2.PdfReader(original_file)
watermark_pdf = PyPDF2.PdfReader(watermark_file)
# Create a PdfWriter object to write the output
pdf_writer = PyPDF2.PdfWriter()
# Loop through the pages of the original PDF
for page_num in range(len(original_pdf.pages)):
page = original_pdf.pages[page_num]
# Merge the watermark with the current page
page.merge_page(watermark_pdf.pages[0]) # Apply watermark to each page
# Add the merged page to the writer
pdf_writer.add_page(page)
# Save the watermarked PDF
with open("watermarked_output.pdf", "wb") as output_file:
pdf_writer.write(output_file)
print("Watermarking complete!")
Explanation:
-
PdfReader
: Reads the input PDFs. -
merge_page()
: Merges the watermark (fromwatermark.pdf
) onto each page of the original PDF. -
PdfWriter
: Writes the merged pages into a new PDF.
Note: If the watermark is an image (like a logo), it needs to be converted into a PDF format for this to work with PyPDF2.
📑 Merging PDFs with PyPDF2
Merging multiple PDFs into a single document is one of the most common tasks when handling PDF files. PyPDF2 makes it simple to merge any number of PDFs into one.
1. Merging PDFs into One
Here’s how you can merge multiple PDFs into a single file:
import PyPDF2
# List of PDFs to merge
pdfs_to_merge = ["file1.pdf", "file2.pdf", "file3.pdf"]
# Create a PdfWriter object
pdf_writer = PyPDF2.PdfWriter()
# Loop through all PDF files
for pdf_file in pdfs_to_merge:
with open(pdf_file, "rb") as file:
pdf_reader = PyPDF2.PdfReader(file)
# Add all pages of the current PDF to the writer
for page_num in range(len(pdf_reader.pages)):
page = pdf_reader.pages[page_num]
pdf_writer.add_page(page)
# Write the merged PDF to a new file
with open("merged_output.pdf", "wb") as output_file:
pdf_writer.write(output_file)
print("Merging complete!")
Explanation:
-
PdfReader
: Reads each input PDF. -
add_page()
: Adds pages from each input PDF to thePdfWriter
. -
PdfWriter.write()
: Writes the combined content into a new PDF.
🧩 Advanced: Merging PDFs with Specific Page Ranges
You can also merge specific pages from each PDF rather than merging the entire documents. Here’s how to merge only the first 2 pages of the first PDF and the last 3 pages of the second PDF:
import PyPDF2
# List of PDFs and specific page ranges to merge
pdfs_to_merge = [("file1.pdf", [0, 1]), ("file2.pdf", [-3, -1])]
# Create a PdfWriter object
pdf_writer = PyPDF2.PdfWriter()
# Loop through each file and page range
for pdf_file, page_range in pdfs_to_merge:
with open(pdf_file, "rb") as file:
pdf_reader = PyPDF2.PdfReader(file)
# Add specific pages based on the range
for page_num in range(page_range[0], page_range[1] + 1):
page = pdf_reader.pages[page_num]
pdf_writer.add_page(page)
# Write the merged PDF to a new file
with open("custom_merged_output.pdf", "wb") as output_file:
pdf_writer.write(output_file)
print("Custom Merging complete!")
Explanation:
-
page_range
: This is a list that defines which pages to include in the merged PDF. Negative numbers allow you to specify pages from the end (e.g.,-1
is the last page). -
add_page()
: Adds only the pages defined in the range.
🧠 Final Thoughts
With PyPDF2, watermarking and merging PDFs are straightforward tasks that can be automated with just a few lines of code. Whether you need to protect your documents with a watermark or combine multiple PDFs into a single document, PyPDF2 provides the tools necessary to handle these tasks efficiently.
💡 Use Cases:
-
Watermarking: Protect confidential documents, add copyright watermarks to reports, or add timestamps to scans.
-
Merging: Combine various reports, invoices, or chapters into a single document for easier sharing or printing.