MarkItDown
Transform your documents into clean, structured Markdown: PDF, Word, PowerPoint, Excel, and more with the powerful Python conversion library.
MarkItDown is an enterprise-grade document conversion library open-sourced by Microsoft that makes it easy to transform various file formats into clean, semantic Markdown. Perfect for content teams, developers, and documentation workflows, our tool preserves formatting while creating accessible, maintainable text content.
Supported Formats
PDF Files (.pdf)
Convert PDF documents with full text and layout preservation
PowerPoint (.pptx, .ppt)
Transform presentations into structured markdown. Legacy .ppt support via ppt.to-markdown.com
Word (.docx, .doc)
Convert Word files with formatting preservation. Legacy .doc support via doc.to-markdown.com
Excel (.xlsx, .xls)
Convert spreadsheets to markdown tables. Legacy .xls support via xls.to-markdown.com
Images
Extract EXIF metadata and perform OCR on images
Audio
Extract metadata and transcribe speech to text
HTML & Text
Convert HTML (including Wikipedia), CSV, JSON, XML and other text formats
ZIP Archives
Automatically process and convert all compatible files within ZIP archives
More Formats
Additional formats are being added regularly. Check our documentation for the latest supported formats.
Installation
Install MarkItDown using pip:
pip install markitdown
Or install from source:
git clone https://github.com/microsoft/markitdown
cd markitdown
pip install -e .
Usage Examples
Basic usage:
from markitdown import MarkItDown
markitdown = MarkItDown()
result = markitdown.convert("test.xlsx")
print(result.text_content)
Using with LLM for image description:
from markitdown import MarkItDown
from openai import OpenAI
client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
result = md.convert("example.jpg")
print(result.text_content)
Command-line usage:
# Convert a file
markitdown path-to-file.pdf > document.md
# Pipe content
cat path-to-file.pdf | markitdown
Docker Usage
docker build -t markitdown:latest .
docker run --rm -i markitdown:latest < ~/your-file.pdf > output.md
Resources
Contributing
MarkItDown is an open source project from Microsoft that welcomes community contributions. Interested developers can refer to the project's contribution guidelines to learn how to participate. The project accepts various forms of contributions including bug fixes, new features, and documentation improvements.