docx

enoch-robinson's avatarfrom enoch-robinson

Word 文档处理工具包。用于创建新文档、编辑现有文档、处理修订追踪、添加批注、提取文本。当需要处理 .docx 文件进行文档创建、修改或分析时使用此技能。

0stars🔀0forks📁View on GitHub🕐Updated Jan 10, 2026

When & Why to Use This Skill

The DOCX Processing toolkit is a comprehensive suite for programmatically managing Microsoft Word documents. It enables seamless creation, editing, and analysis of .docx files, featuring advanced capabilities such as track changes (redlining), comment management, and multi-format conversion. By integrating tools like Pandoc, python-docx, and docx-js, it provides a robust bridge between AI-generated content and professional document standards.

Use Cases

  • Automated Report Generation: Programmatically generate structured business reports, invoices, or technical manuals with custom headings, tables, and formatting.
  • Legal and Editorial Workflows: Manage complex document revisions by tracking changes, adding annotations, and processing redlines during contract reviews or collaborative writing.
  • Content Migration and Publishing: Convert legacy Word documents into Markdown for web documentation or transform them into PDFs and images for standardized distribution.
  • Data Extraction and Analysis: Batch process large volumes of .docx files to extract text, metadata, or table data for integration into databases or analytical tools.
namedocx
descriptionWord 文档处理工具包。用于创建新文档、编辑现有文档、处理修订追踪、添加批注、提取文本。当需要处理 .docx 文件进行文档创建、修改或分析时使用此技能。

DOCX Processing Guide

工作流决策树

任务 推荐方法
读取/分析内容 pandoc 转Markdown
创建新文档 docx-js (JavaScript)
编辑现有文档 python-docx 或OOXML
修订追踪 Redlining 工作流

读取文档

转换为 Markdown

# 基础转换
pandoc document.docx -o output.md

# 保留修订追踪
pandoc --track-changes=all document.docx -o output.md

创建新文档 (JavaScript)

const { Document, Packer, Paragraph, TextRun } = require('docx');

const doc = new Document({
  sections: [{
    children: [
      new Paragraph({
        children: [
          new TextRun({ text: "标题", bold: true, size: 32 }),],
      }),
      new Paragraph({
        children: [
          new TextRun("正文内容"),
        ],
      }),
    ],
  }],
});

// 导出
Packer.toBuffer(doc).then(buffer => {
  fs.writeFileSync("output.docx", buffer);
});

编辑文档 (Python)

from docx import Document

# 打开文档
doc = Document('existing.docx')

# 添加段落
doc.add_paragraph('新段落内容')

# 添加标题
doc.add_heading('新标题', level=1)

# 添加表格
table = doc.add_table(rows=2, cols=2)
table.cell(0, 0).text = '单元格内容'

# 保存
doc.save('modified.docx')

提取文本

from docx import Document

doc = Document('document.docx')
full_text = []
for para in doc.paragraphs:
    full_text.append(para.text)
print('\n'.join(full_text))

修订追踪工作流

  1. 转换查看pandoc --track-changes=all file.docx -o current.md
  2. 识别变更:标记需要修改的位置
  3. 实施变更:使用 OOXML 添加 <w:ins><w:del> 标签
  4. 验证结果:再次转换确认修改正确

文档转图片

# DOCX → PDF
soffice --headless --convert-to pdf document.docx

# PDF → 图片
pdftoppm -jpeg -r 150 document.pdf page

依赖安装

# Python
pip install python-docx

# JavaScript
npm install docx

# 命令行工具
sudo apt-get install pandoc libreoffice poppler-utils