translation-sync
Synchronize English MDX files with the i18n/ur folder, ensuring technical terms are handled correctly during the Urdu translation process.Agent: Linguist
When & Why to Use This Skill
This Claude skill provides an automated solution for synchronizing and translating English MDX documentation into Urdu. Designed specifically for technical content and Docusaurus frameworks, it leverages incremental translation and translation memory to reduce costs while ensuring high-quality localization. It features a robust technical glossary to maintain terminology consistency and automatically handles complex formatting requirements like RTL (Right-to-Left) layout and code block preservation.
Use Cases
- Localizing Docusaurus-based technical documentation and textbooks for Urdu-speaking audiences while maintaining project structure.
- Reducing translation API costs and processing time by using incremental updates that only translate modified sections of a document.
- Ensuring consistent translation of specialized technical terms (e.g., ROS 2, API, SDK) across large documentation sets using a managed glossary.
- Automating the preservation of non-translatable elements such as code blocks, LaTeX math formulas, and frontmatter metadata during the localization process.
- Building a reusable Translation Memory (TM) in industry-standard TMX format to improve translation speed and consistency over time.
| name | translation-sync |
|---|---|
| description | Path to technical terms glossary |
| Agent | Linguist |
| version | 1.1.0 |
| required | false |
| example | "i18n/ur/docusaurus-plugin-content-docs/current" |
| default | ".claude/skills/translation-sync/assets/glossary.json" |
Translation Sync
Agent: Linguist
Synchronize English MDX files with the i18n/ur folder for Docusaurus, ensuring technical terms are handled correctly during the Urdu translation process. This skill maintains translation consistency, tracks sync status, and preserves code blocks and technical terminology.
Quick Setup
# Set environment variables
export OPENAI_API_KEY="sk-..."
# Check sync status
.claude/skills/translation-sync/scripts/setup.sh --status
# Sync all files (detect changes)
.claude/skills/translation-sync/scripts/setup.sh --sync
# Sync with incremental mode (only changed sections) - v1.1.0
.claude/skills/translation-sync/scripts/setup.sh --sync --incremental
# Show changes before re-translating - v1.1.0
.claude/skills/translation-sync/scripts/setup.sh --diff docs/chapter-1/index.md
# Translate a specific file
.claude/skills/translation-sync/scripts/setup.sh --translate docs/intro.md
# Translate incrementally - v1.1.0
.claude/skills/translation-sync/scripts/setup.sh --translate docs/chapter-1/index.md --incremental
# Add term to glossary
.claude/skills/translation-sync/scripts/setup.sh --add-term "ROS 2" --urdu "آر او ایس ٹو"
# Translation Memory commands - v1.1.0
.claude/skills/translation-sync/scripts/setup.sh --tm-stats
.claude/skills/translation-sync/scripts/setup.sh --tm-export translations.tmx --format tmx
# Validate translations
.claude/skills/translation-sync/scripts/setup.sh --validate
Command Options
| Option | Description | Default |
|---|---|---|
--status |
Show sync status for all files | - |
--sync |
Sync all changed files | - |
--translate FILE |
Translate a specific file | - |
--diff FILE |
Show changes since last translation (v1.1.0) | - |
--validate |
Validate all translations | - |
--add-term TERM |
Add term to glossary | - |
--urdu TEXT |
Urdu translation/transliteration for term | - |
--list-terms |
List all glossary terms | - |
--incremental |
Only translate changed sections (v1.1.0) | false |
--no-tm |
Disable translation memory (v1.1.0) | false |
--preserve-terms |
Keep technical terms in English | true |
--source DIR |
Source directory | docs |
--target DIR |
Target directory | i18n/ur/.../current |
--model MODEL |
OpenAI model | gpt-4o-mini |
--dry-run |
Show what would be translated | false |
-h, --help |
Show help message | - |
Translation Memory Commands (v1.1.0)
| Option | Description |
|---|---|
--tm-stats |
Show translation memory statistics |
--tm-export FILE |
Export TM to file (JSON or TMX format) |
--tm-import FILE |
Import TM from JSON file |
--tm-clear |
Clear all translation memory |
--format FORMAT |
Export format: json or tmx |
What It Does
1. File Synchronization
Tracks English source files and their Urdu translations:
docs/ i18n/ur/.../current/
├── intro.md ──► ├── intro.md (translated)
├── chapter-1/ ├── chapter-1/
│ └── index.md ──► │ └── index.md (translated)
└── chapter-2/ └── chapter-2/
└── index.md ──► └── index.md (needs update)
2. Translation Status Tracking
Maintains .translation-status.json to track:
{
"files": {
"docs/intro.md": {
"source_hash": "abc123",
"translated_hash": "def456",
"last_synced": "2026-01-02T12:00:00Z",
"status": "synced",
"sections": {
"# Introduction": {"hash": "xyz789", "translated": "..."}
}
}
}
}
3. Diff Mode (v1.1.0)
Preview changes before re-translating:
.claude/skills/translation-sync/scripts/setup.sh --diff docs/chapter-1/index.md
Output:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Changes detected in docs/chapter-1/index.md
Changed sections:
• ## Getting Started (modified)
• ## Advanced Topics (new)
Diff:
--- a/docs/chapter-1/index.md (previous)
+++ b/docs/chapter-1/index.md (current)
@@ -15,6 +15,10 @@
This is the original content.
+
+## Advanced Topics
+
+New content added here.
4. Section-Level Incremental Translation (v1.1.0)
Only translate sections that changed, preserving cached translations for unchanged sections:
.claude/skills/translation-sync/scripts/setup.sh --translate docs/chapter-1/index.md --incremental
Output:
Translated: docs/chapter-1/index.md -> i18n/ur/.../chapter-1/index.md
Sections: 2 translated, 5 cached, 1 TM hits
Benefits:
- 60-80% reduction in API costs for updates
- Faster translation times
- Preserves human edits to unchanged sections
5. Translation Memory (v1.1.0)
Stores and reuses previously translated segments:
# View TM statistics
.claude/skills/translation-sync/scripts/setup.sh --tm-stats
# Output:
# Total segments: 145
# Total uses: 312
# Average reuse: 2.15x
#
# Most used segments:
# 15x: ROS 2 is a robotics middleware that provides...
# 12x: This chapter covers the basics of...
# 8x: For more information, see the official...
TM Storage (.translation-memory.json):
{
"meta": {
"version": "1.0.0",
"total_segments": 145,
"last_updated": "2026-01-02T21:00:00Z"
},
"segments": {
"abc123hash": {
"source": "ROS 2 is a robotics middleware.",
"target": "ROS 2 ایک روبوٹکس مڈل ویئر ہے۔",
"count": 15,
"last_used": "2026-01-02T21:00:00Z"
}
}
}
Export to TMX (industry standard):
.claude/skills/translation-sync/scripts/setup.sh --tm-export translations.tmx --format tmx
6. Technical Terms Glossary
Maintains consistent translation of technical terms:
{
"ROS 2": {
"urdu": "آر او ایس ٹو",
"keep_english": true,
"context": "Robot Operating System 2"
},
"Gazebo": {
"urdu": "گیزیبو",
"keep_english": true,
"context": "Simulation software"
},
"node": {
"urdu": "نوڈ",
"keep_english": false,
"context": "ROS node"
}
}
7. Smart Translation
Preserves code blocks, frontmatter, and technical elements:
Before (English):
# Introduction to ROS 2
ROS 2 is a robotics middleware.
\`\`\`python
import rclpy
\`\`\`
After (Urdu):
# آر او ایس ٹو کا تعارف
ROS 2 ایک روبوٹکس مڈل ویئر ہے۔
\`\`\`python
import rclpy
\`\`\`
8. RTL Formatting
Automatically applies RTL markers for proper Urdu display:
<div dir="rtl">
# اردو عنوان
یہ اردو متن ہے۔
</div>
Bundled Resources
1. Python Dependencies
File: requirements.txt
openai>=1.0.0
python-dotenv>=1.0.0
rich>=13.0.0
pyyaml>=6.0.0
2. Technical Terms Glossary
File: assets/glossary.json
{
"meta": {
"version": "1.0.0",
"description": "Technical terms glossary for Physical AI textbook"
},
"terms": {
"ROS 2": {"urdu": "آر او ایس ٹو", "keep_english": true},
"Gazebo": {"urdu": "گیزیبو", "keep_english": true},
"robot": {"urdu": "روبوٹ", "keep_english": false},
"sensor": {"urdu": "سینسر", "keep_english": true},
"actuator": {"urdu": "ایکچویٹر", "keep_english": true},
"node": {"urdu": "نوڈ", "keep_english": false},
"topic": {"urdu": "ٹاپک", "keep_english": false},
"publisher": {"urdu": "پبلشر", "keep_english": false},
"subscriber": {"urdu": "سبسکرائبر", "keep_english": false},
"message": {"urdu": "پیغام", "keep_english": false},
"service": {"urdu": "سروس", "keep_english": false},
"action": {"urdu": "ایکشن", "keep_english": false},
"package": {"urdu": "پیکج", "keep_english": false},
"workspace": {"urdu": "ورک سپیس", "keep_english": false},
"launch file": {"urdu": "لانچ فائل", "keep_english": false},
"URDF": {"urdu": "یو آر ڈی ایف", "keep_english": true},
"TF": {"urdu": "ٹی ایف", "keep_english": true},
"SLAM": {"urdu": "سلیم", "keep_english": true},
"navigation": {"urdu": "نیویگیشن", "keep_english": false},
"localization": {"urdu": "لوکلائزیشن", "keep_english": false},
"simulation": {"urdu": "سمولیشن", "keep_english": false},
"physical AI": {"urdu": "فزیکل اے آئی", "keep_english": true},
"machine learning": {"urdu": "مشین لرننگ", "keep_english": false},
"deep learning": {"urdu": "ڈیپ لرننگ", "keep_english": false},
"neural network": {"urdu": "نیورل نیٹ ورک", "keep_english": false},
"API": {"urdu": "اے پی آئی", "keep_english": true},
"SDK": {"urdu": "ایس ڈی کے", "keep_english": true},
"CLI": {"urdu": "سی ایل آئی", "keep_english": true},
"GUI": {"urdu": "جی یو آئی", "keep_english": true}
}
}
3. Translation Configuration
File: assets/translation_config.json
{
"source_language": "en",
"target_language": "ur",
"preserve_patterns": [
"```[\\s\\S]*?```",
"`[^`]+`",
"\\{[^}]+\\}",
"\\[[^\\]]*\\]\\([^)]*\\)",
"^---[\\s\\S]*?---",
"<[^>]+>",
"\\$\\$[\\s\\S]*?\\$\\$",
"\\$[^$]+\\$"
],
"rtl_wrapper": "<div dir=\"rtl\">\n\n{content}\n\n</div>",
"frontmatter_keys_to_translate": ["title", "description", "sidebar_label"],
"skip_directories": ["node_modules", ".git", "build", ".docusaurus"],
"file_extensions": [".md", ".mdx"]
}
4. Translation Module
File: scripts/translator.py
from translator import TranslationSync
sync = TranslationSync()
# Check status
status = sync.get_status()
print(f"Pending: {status.pending}, Synced: {status.synced}")
# Translate file (with incremental and TM)
result = sync.translate_file("docs/intro.md", incremental=True, use_tm=True)
print(f"Sections: {result.sections_translated} new, {result.sections_cached} cached")
# Show diff
diff = sync.show_diff("docs/chapter-1/index.md")
print(f"Changed sections: {diff.changed_sections}")
# Translation memory
stats = sync.get_tm_stats()
print(f"TM segments: {stats['total_segments']}, reuse: {stats['avg_reuse']}x")
# Export TM to TMX
sync.export_tm("translations.tmx", format="tmx")
5. Test Suite
File: scripts/test.sh - Bash test runner (16+ tests)
File: scripts/test_translator.py - Python unit tests (21+ tests)
# Run tests
.claude/skills/translation-sync/scripts/test.sh
Usage Instructions
Step 1: Set Environment Variables
# Add to .env file
OPENAI_API_KEY=sk-...
Step 2: Check Sync Status
# See which files need translation
.claude/skills/translation-sync/scripts/setup.sh --status
Output:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 Translation Status
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✓ docs/intro.md [synced]
✓ docs/chapter-1/index.md [synced]
○ docs/chapter-2/index.md [pending]
⚠ docs/chapter-3/index.md [outdated]
Summary: 2 synced, 1 pending, 1 outdated
Step 3: Preview Changes (v1.1.0)
# See what changed before re-translating
.claude/skills/translation-sync/scripts/setup.sh --diff docs/chapter-3/index.md
Step 4: Translate Files
# Translate single file (full)
.claude/skills/translation-sync/scripts/setup.sh --translate docs/chapter-2/index.md
# Translate incrementally (only changed sections)
.claude/skills/translation-sync/scripts/setup.sh --translate docs/chapter-3/index.md --incremental
# Sync all pending/outdated files incrementally
.claude/skills/translation-sync/scripts/setup.sh --sync --incremental
# Dry run to preview changes
.claude/skills/translation-sync/scripts/setup.sh --sync --dry-run
Step 5: Manage Glossary
# List all terms
.claude/skills/translation-sync/scripts/setup.sh --list-terms
# Add new term
.claude/skills/translation-sync/scripts/setup.sh --add-term "lidar" --urdu "لائیڈار"
# Add term that should stay in English
.claude/skills/translation-sync/scripts/setup.sh --add-term "RViz" --urdu "آر ویز" --keep-english
Step 6: Manage Translation Memory (v1.1.0)
# View TM statistics
.claude/skills/translation-sync/scripts/setup.sh --tm-stats
# Export to JSON
.claude/skills/translation-sync/scripts/setup.sh --tm-export backup.json
# Export to TMX (industry standard)
.claude/skills/translation-sync/scripts/setup.sh --tm-export translations.tmx --format tmx
# Import from another project
.claude/skills/translation-sync/scripts/setup.sh --tm-import external_tm.json
# Clear TM (if needed)
.claude/skills/translation-sync/scripts/setup.sh --tm-clear
Step 7: Validate Translations
# Check all translations for issues
.claude/skills/translation-sync/scripts/setup.sh --validate
Translation Rules
Terms Handling
| Category | Rule | Example |
|---|---|---|
| Acronyms | Keep English, add transliteration | ROS 2 (آر او ایس ٹو) |
| Product Names | Keep English | Gazebo, RViz |
| Technical Verbs | Translate | publish → شائع کریں |
| Code Elements | Never translate | rclpy, ros2 run |
| File Paths | Never translate | /opt/ros/humble |
Preservation Rules
- Code Blocks: Never translate content inside ``` or `
- Frontmatter: Only translate specified keys (title, description)
- Links: Keep URLs intact, translate link text
- Components: Keep JSX/MDX components unchanged
- Math: Keep LaTeX expressions unchanged
Verification Checklist
-
OPENAI_API_KEYenvironment variable set - Source directory exists with MDX files
- Target directory structure created
- Glossary loaded correctly
- Translation preserves code blocks
- RTL formatting applied correctly
- Technical terms handled per glossary
- Translation memory working (v1.1.0)
- Incremental mode caching sections (v1.1.0)
Integration with Docusaurus
This skill is designed for Docusaurus i18n:
// docusaurus.config.js
module.exports = {
i18n: {
defaultLocale: 'en',
locales: ['en', 'ur'],
localeConfigs: {
ur: {
label: 'اردو',
direction: 'rtl',
},
},
},
};
Troubleshooting
| Issue | Solution |
|---|---|
| RTL not displaying | Check dir="rtl" wrapper is present |
| Code blocks translated | Verify preserve_patterns in config |
| Terms inconsistent | Add terms to glossary.json |
| API rate limit | Reduce batch size or add delays |
| Frontmatter broken | Check YAML syntax in source file |
| TM not matching | Segments must be exact matches |
| Incremental not caching | Run full sync first to populate section hashes |
Cost Considerations
| Resource | Cost |
|---|---|
| GPT-4o-mini | $0.15 / 1M input tokens |
| GPT-4o | $2.50 / 1M input tokens |
| Estimated per chapter (full) | ~$0.02 (gpt-4o-mini) |
| Estimated per chapter (incremental) | ~$0.005 (60-80% savings) |
Related
- Skill:
urdu-rtl-styler- RTL CSS styling for Docusaurus - Skill:
docusaurus-scaffold- Docusaurus site scaffolding - Feature:
005-translation- Translation specification
Changelog
v1.1.0 (2026-01-02)
Feature Enhancements
- NEW:
--diff FILE- Preview changes since last translation - NEW:
--incremental- Section-level incremental translation (60-80% cost savings) - NEW: Translation Memory (TM) cache with reuse tracking
--tm-stats- View TM statistics--tm-export FILE- Export to JSON or TMX format--tm-import FILE- Import from JSON--tm-clear- Clear TM cache
- NEW:
--no-tm- Disable translation memory - Section-level hashing for precise change detection
- Status file now includes section data for incremental updates
- TranslationResult includes
sections_translated,sections_cached,tm_hits
v1.0.0 (2026-01-02)
Initial Release
- File synchronization between en and ur
- Translation status tracking
- Technical terms glossary with 30+ terms
- Code block and frontmatter preservation
- RTL wrapper injection
- OpenAI-powered translation
- Dry-run mode for previewing changes
- Validation for translation quality