Convert (almost) every document to Markdown

mikhomikho AdministratorOG

Microsoft has released its own document parser for LLM use!
.
.
Introducing MarkItDown, a 100% open-source, one-stop solution for effortlessly converting any file to Markdown—perfect for text analysis, indexing, and more!

Here’s what makes it special:

↳ Converts PDF, Word, Excel, PPT, images, audio to markdown
↳ Extracts EXIF, OCR, and transcripts automatically
↳ Available via CLI, Python API, or Docker
↳ Offers LLM-based image descriptions
↳ Supports batch conversions

https://github.com/microsoft/markitdown

“Technology is best when it brings people together.” – Matt Mullenweg

Comments

  • Had a quick look, and it seems for docx they actually use mammoth to first convert it into HTML, and then convert that to markdown. Just wondering how accurate the conversion is for something a bit more complex...

    Thanked by (2)Not_Oles wankel
  • havochavoc OGContent WriterSenpai

    There is also firecrawl, readability and jina reader in similar space. And some others I forgot

    Thanked by (1)wankel
  • Obsidian it shall be, this is the way.

    Insert signature here, $5 tip required

  • Pandoc doesn't have that AI stuff but handles way more text formats.

    Thanked by (1)wankel
Sign In or Register to comment.