New Package: Lightweight HTML to Markdown utility for LLM use
Our members saw the sheer speed of Markitdown in the Python space, so we begun to wonder what lessons we could bring to the Node JS space, with LLM use in mind.
Now Released: new LLM oriented utility for converting HTML to Markdown - @nanocollective/get-md
For speed, markdown is often a preferable format for LLMs to digest content. So when you want to process a webpage, converting it to markdown helps strip out anything but the relevant content an LLM needs to perform a job.
We produced a binary version of Markitdown that can be bundled into a Node JS package (see @mote-software/markitdown) but the cold-boot performance hit is much worse than we hoped. We'll explore options to speed this up, but ultimately using binaries - especially Python based - will always have a degraded performance.
So with a focus on speed and quality, this new Node JS package can take in HTML and produce a markdown format optimised for LLM use within moments (~100ms), and save you magnitudes of compute.
[!TIP] We'd like to bring more input formats too, so you could convert PDF, DOCX, audio, and other file formats... just like Markitdown does. We welcome contributors to help our community with this mission.
Community
We're a small community-led team building local and privacy-first AI solutions under the Nano Collective and would love your help! Whether you're interested in contributing code, documentation, or just being part of our community, there are several ways to get involved.
If you want to contribute to the code:
- Read our detailed CONTRIBUTING.md guide for information on development setup, coding standards, and how to submit your changes.
If you want to be part of our community or help with other aspects like design or marketing:
Join our Discord server to connect with other users, ask questions, share ideas, and get help: Join our Discord server
Head to our GitHub issues or discussions to open and join current conversations with others in the community.
Want to join the discussion? Head over to GitHub to share your thoughts!
View Discussion on GitHub