MarkItDown - try MarkItDown Demo Online | Python Library
Document Conversion Made Simple

MarkItDown

Transform your documents into clean, structured Markdown: PDF, Word, PowerPoint, Excel, and more with the powerful Python conversion library.

MarkItDown is an enterprise-grade document conversion library open-sourced by Microsoft that makes it easy to transform various file formats into clean, semantic Markdown. Perfect for content teams, developers, and documentation workflows, our tool preserves formatting while creating accessible, maintainable text content.

10+
File Formats Supported
99%
Format Preservation
--
GitHub Stars

Supported Formats

PDF Files (.pdf)

Convert PDF documents with full text and layout preservation

PowerPoint (.pptx, .ppt)

Transform presentations into structured markdown. Legacy .ppt support via ppt.to-markdown.com

Word (.docx, .doc)

Convert Word files with formatting preservation. Legacy .doc support via doc.to-markdown.com

Excel (.xlsx, .xls)

Convert spreadsheets to markdown tables. Legacy .xls support via xls.to-markdown.com

Images

Extract EXIF metadata and perform OCR on images

Audio

Extract metadata and transcribe speech to text

HTML & Text

Convert HTML (including Wikipedia), CSV, JSON, XML and other text formats

ZIP Archives

Automatically process and convert all compatible files within ZIP archives

More Formats

Additional formats are being added regularly. Check our documentation for the latest supported formats.

Installation

Install MarkItDown using pip:

pip install markitdown

Or install from source:

git clone https://github.com/microsoft/markitdown
cd markitdown
pip install -e .

Usage Examples

Basic usage:

from markitdown import MarkItDown

markitdown = MarkItDown()
result = markitdown.convert("test.xlsx")
print(result.text_content)

Using with LLM for image description:

from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
result = md.convert("example.jpg")
print(result.text_content)

Command-line usage:

# Convert a file
markitdown path-to-file.pdf > document.md

# Pipe content
cat path-to-file.pdf | markitdown

Docker Usage

docker build -t markitdown:latest .
docker run --rm -i markitdown:latest < ~/your-file.pdf > output.md

Resources

Documentation - Comprehensive guides unofficial documentation
Code Repository - Sample implementations and use cases
Issues - Get help and feedback

Contributing

MarkItDown is an open source project from Microsoft that welcomes community contributions. Interested developers can refer to the project's contribution guidelines to learn how to participate. The project accepts various forms of contributions including bug fixes, new features, and documentation improvements.