Convert Word documents from docx to simple HTML and Markdown
mammoth is a JavaScript library for converting Microsoft Word .docx documents into clean, semantic HTML or Markdown. Unlike tools that attempt pixel-perfect rendering, mammoth focuses on extracting document structure—headings, paragraphs, lists, tables, and images—while deliberately ignoring visual formatting like fonts, colors, and borders. This makes it ideal for scenarios where you need to display Word content on the web or repurpose it in content management systems.
The package works with .docx files from Microsoft Word, Google Docs, and LibreOffice. It uses a style mapping system that translates Word paragraph and character styles into HTML elements. For example, "Heading 1" becomes
element with specific CSS classes.
With over 1.7 million weekly downloads, mammoth is widely used in document processing pipelines, content migration projects, and web applications that need to accept Word documents as input. It returns promises with both the converted HTML string and an array of warning/error messages, making it straightforward to integrate into modern async JavaScript workflows. The library handles complex documents efficiently and provides hooks for customizing image handling and element attributes.
import mammoth from 'mammoth';
import fs from 'fs/promises';
async function convertWordDocument() {
// Basic conversion from file path
const result = await mammoth.convertToHtml({
path: 'document.docx'
});
console.log(result.value); // HTML string
console.log(result.messages); // Warnings/errors
// Convert to Markdown instead
const mdResult = await mammoth.convertToMarkdown({
path: 'document.docx'
});
await fs.writeFile('output.md', mdResult.value);
// Advanced: custom style mapping and image handling
const customResult = await mammoth.convertToHtml({
path: 'document.docx',
styleMap: [
"p[style-name='CodeBlock'] => pre.code-block",
"p[style-name='Warning'] => div.alert.alert-warning > p",
"r[style-name='Highlight'] => mark"
],
convertImage: mammoth.images.imgElement(function(image) {
return image.read('base64').then(function(imageBuffer) {
return {
src: `data:${image.contentType};base64,${imageBuffer}`,
alt: image.altText || 'Document image'
};
});
})
});
// Check for conversion issues
const errors = customResult.messages.filter(m => m.type === 'error');
if (errors.length > 0) {
console.error('Conversion errors:', errors);
}
return customResult.value;
}
convertWordDocument().catch(console.error);Content Management System Integration: Import blog posts or articles written in Word directly into a CMS. Mammoth extracts the semantic structure, allowing editors to write in their preferred tool while maintaining clean HTML output without inline styles or formatting cruft.
Document Preview in Web Apps: Display uploaded .docx files in browser-based applications without requiring users to download them. The converted HTML renders natively in the browser, preserving headings, lists, and tables for readable previews.
Legacy Content Migration: Convert large archives of Word documents into web-friendly formats during platform migrations. The Markdown output option is particularly useful for migrating to static site generators or documentation platforms like Jekyll or Hugo.
Collaborative Editing Workflows: Accept Word documents from non-technical stakeholders (like legal or marketing teams) and automatically convert them into HTML for web publication, bridging the gap between traditional document workflows and modern web publishing.
Email Template Processing: Extract content from Word-formatted email templates and convert them to HTML for email clients, ensuring consistent structure while stripping unnecessary formatting that breaks in email environments.
npm install mammothpnpm add mammothbun add mammoth