Automating Word Document Processing with Python
Microsoft Word documents (.docx
) are commonly used for reports, contracts, and documentation. Python provides powerful libraries like python-docx
to create, edit, format, and extract content from Word documents.
Installing Required Libraries
pip install python-docx
python-docx
– Handles Word (.docx
) documents, including text formatting, tables, images, and styles.
Creating a New Word Document
from docx import Document
# Create a new Word document
doc = Document()
# Add a title
doc.add_heading("Automated Word Report", level=1)
# Save the document
doc.save("document.docx")
print("Word document created successfully.")
Adding Paragraphs and Text Formatting
# Add a paragraph
para = doc.add_paragraph("This document is generated using Python.")
# Apply bold and italic formatting
run = para.add_run(" This is an automated report.")
run.bold = True
run.italic = True
doc.save("formatted_document.docx")
Inserting Tables in Word Documents
# Create a table with 3 rows and 3 columns
table = doc.add_table(rows=3, cols=3)
# Add data
table.cell(0, 0).text = "Product"
table.cell(0, 1).text = "Sales"
table.cell(0, 2).text = "Revenue"
table.cell(1, 0).text = "Laptop"
table.cell(1, 1).text = "150"
table.cell(1, 2).text = "$75,000"
doc.save("document_with_table.docx")
Inserting Images in Word Documents
# Insert an image
doc.add_picture("image.png", width=docx.shared.Inches(4))
doc.save("document_with_image.docx")
Reading and Extracting Text from a Word Document
# Open an existing document
doc = Document("document.docx")
# Extract text from all paragraphs
for para in doc.paragraphs:
print(para.text)
Replacing Text in a Word Document
# Replace specific text
for para in doc.paragraphs:
if "Python" in para.text:
para.text = para.text.replace("Python", "Automated System")
doc.save("updated_document.docx")
Merging Multiple Word Documents
from docx import Document
# Create a new document
final_doc = Document()
# List of Word documents to merge
files = ["doc1.docx", "doc2.docx"]
for file in files:
temp_doc = Document(file)
for para in temp_doc.paragraphs:
final_doc.add_paragraph(para.text)
final_doc.save("merged_document.docx")
print("Word documents merged successfully.")
Generating Automated Reports from Excel Data
import pandas as pd
# Load Excel data
df = pd.read_excel("sales_data.xlsx")
# Create Word document
doc = Document()
doc.add_heading("Sales Report", level=1)
# Add table
table = doc.add_table(rows=df.shape[0] + 1, cols=df.shape[1])
# Add headers
for col_index, col_name in enumerate(df.columns):
table.cell(0, col_index).text = col_name
# Add data
for row_index, row in df.iterrows():
for col_index, value in enumerate(row):
table.cell(row_index + 1, col_index).text = str(value)
doc.save("automated_report.docx")
Automating Word Report Generation and Emailing
import yagmail
yag = yagmail.SMTP("your_email@gmail.com", "your_password")
# Send the Word report via email
yag.send(
to="recipient@example.com",
subject="Automated Word Report",
contents="Please find the attached Word report.",
attachments="document.docx"
)
print("Word report emailed successfully.")
Conclusion
This section covered automating Word document processing, including creating Word files, formatting text, inserting tables and images, extracting and replacing text, merging documents, generating reports from Excel, and automating email distribution. These techniques are useful for document automation and reporting tasks.
Would you like additional examples or modifications?