dom-to-semantic-markdown

DOM to Semantic-Markdown for use with LLMs

Stars: 708

Visit

DOM to Semantic Markdown is a tool that converts HTML DOM to Semantic Markdown for use in Large Language Models (LLMs). It maximizes semantic information, token efficiency, and preserves metadata to enhance LLMs' processing capabilities. The tool captures rich web content structure, including semantic tags, image metadata, table structures, and link destinations. It offers customizable conversion options and supports both browser and Node.js environments.

README:

DOM to Semantic Markdown

This library converts HTML DOM to a semantic Markdown format optimized for use with Large Language Models (LLMs). It preserves the semantic structure of web content, extracts essential metadata, and reduces token usage compared to raw HTML, making it easier for LLMs to understand and process information.

Key Features

Semantic Structure Preservation: Retains the meaning of HTML elements like <header>, <footer>, <nav>, and more.
Metadata Extraction: Captures important metadata such as title, description, keywords, Open Graph tags, Twitter Card tags, and JSON-LD data.
Token Efficiency: Optimizes for token usage through URL refification and concise representation of content.
Main Content Detection: Automatically identifies and extracts the primary content section of a webpage.
Table Column Tracking: Adds unique identifiers to table columns, improving LLM's ability to correlate data across rows.

Special Feature Examples

Here are examples showcasing the library's special features using the CLI tool:

1. Simple Content Extraction:

npx d2m@latest -u https://xkcd.com

This command fetches https://xkcd.com and converts it to Markdown

Click to view the output

- [Archive](/archive)
- [What If?](https://what-if.xkcd.com/)
- [About](/about)
- [Feed](/atom.xml)   ‚Ä¢ [Email](/newsletter/)
- [TW](https://twitter.com/xkcd/)   ‚Ä¢ [FB](https://www.facebook.com/TheXKCD/)
  ‚Ä¢ [IG](https://www.instagram.com/xkcd/)
- [-Books-](/books/)
- [What If? 2](/what-if-2/)
- [WI?](/what-if/)   ‚Ä¢ [TE](/thing-explainer/)   ‚Ä¢ [HT](/how-to/)

<a href="/">![xkcd.com logo](/s/0b7742.png)</a> A webcomic of romance,
sarcasm, math, and language. [Special 10th anniversary edition of WHAT IF?](https://xkcd.com/what-if/) ‚Äîrevised and
annotated with brand-new illustrations and answers to important questions you never thought to ask‚Äîcoming from
November 2024. Preorder [here](https://bit.ly/WhatIf10th) ! Exam Numbers

- [|<](/1/)
- [< Prev](/2965/)
- [Random](//c.xkcd.com/random/comic/)
- [Next >](about:blank#)
- [>|](/)

![Exam Numbers](//imgs.xkcd.com/comics/exam_numbers.png)

- [|<](/1/)
- [< Prev](/2965/)
- [Random](//c.xkcd.com/random/comic/)
- [Next >](about:blank#)
- [>|](/)

Permanent link to this comic: [https://xkcd.com/2966/](https://xkcd.com/2966)
Image URL (for
hotlinking/embedding): [https://imgs.xkcd.com/comics/exam_numbers.png](https://imgs.xkcd.com/comics/exam_numbers.png)![Selected Comics](//imgs.xkcd.com/s/a899e84.jpg)
<a href="//xkcd.com/1732/">![Earth temperature timeline](//imgs.xkcd.com/s/temperature.png)</a>
[RSS Feed](/rss.xml) - [Atom Feed](/atom.xml) - [Email](/newsletter/)
Comics I enjoy:
[Three Word Phrase](http://threewordphrase.com/) , [SMBC](https://www.smbc-comics.com/) , [Dinosaur Comics](https://www.qwantz.com/) , [Oglaf](https://oglaf.com/) (
nsfw), [A Softer World](https://www.asofterworld.com/) , [Buttersafe](https://buttersafe.com/) , [Perry Bible Fellowship](https://pbfcomics.com/) , [Questionable Content](https://questionablecontent.net/) , [Buttercup Festival](http://www.buttercupfestival.com/) , [Homestuck](https://www.homestuck.com/) , [Junior Scientist Power Hour](https://www.jspowerhour.com/)
Other things:
[Tips on technology and government](https://medium.com/civic-tech-thoughts-from-joshdata/so-you-want-to-reform-democracy-7f3b1ef10597) ,
[Climate FAQ](https://www.nytimes.com/interactive/2017/climate/what-is-climate-change.html) , [Katharine Hayhoe](https://twitter.com/KHayhoe)
xkcd.com is best viewed with Netscape Navigator 4.0 or below on a Pentium 3¬±1 emulated in Javascript on an Apple IIGS
at a screen resolution of 1024x1. Please enable your ad blockers, disable high-heat drying, and remove your device
from Airplane Mode and set it to Boat Mode. For security reasons, please leave caps lock on while browsing. This work is
licensed under
a [Creative Commons Attribution-NonCommercial 2.5 License](https://creativecommons.org/licenses/by-nc/2.5/).

This means you're free to copy and share these comics (but not to sell them). [More details](/license.html).

2. Table Column Tracking:

npx d2m@latest -u https://softwareyoga.com/latency-numbers-everyone-should-know/ -t -e

This command fetches and converts the main content from to Markdown and adds unique identifiers to table columns, aiding LLMs in understanding table structure.

Click to view the output

# Latency Numbers Everyone Should Know

## Latency

In a computer network, latency is defined as the amount of time it takes for a packet of data to get from one designated
point to another.

In more general terms, it is the amount of time between the cause and the observation of the effect.

As you would expect, latency is important, very important. As programmers, we all know reading from disk takes longer
than reading from memory or the fact that L1 cache is faster than the L2 cache.

But do you know the orders of magnitude by which these aspects are faster/slower compared to others?

## Latency for common operations

Jeff Dean from Google studied exactly that and came up with figures for latency in various situations.

With improving hardware, the latency at the higher ends of the spectrum are reducing, but not enough to ignore them
completely! For instance, to read 1MB sequentially from disk might have taken¬†20,000,000 ns a decade earlier and with
the advent of SSDs may probably take 1,000,000 ns today. But it is never going to surpass reading directly from memory.

The table below presents the latency for the most common operations on commodity hardware. These data are only
approximations and will vary with the hardware and the execution environment of your code. However, they do serve their
primary purpose, which is to enable us make informed technical decisions to reduce latency.

For better comprehension of¬† the multi-fold increase in latency, scaled figures in relation to L2 cache are also
provided by assuming that the L1 cache reference is 1 sec.

**Scroll horizontally on the table in smaller screens**

| Operation <!-- col-0 --> | Note <!-- col-1 --> | Latency <!-- col-2 --> | Scaled Latency <!-- col-3 --> |
| --- | --- | --- | --- |
| L1 cache reference <!-- col-0 --> | Level-1 cache, usually built onto the microprocessor chip itself. <!-- col-1 --> | 0.5 ns <!-- col-2 --> | Consider L1 cache reference duration is 1 sec <!-- col-3 --> |
| Branch mispredict <!-- col-0 --> | During the execution of a program, CPU predicts the next set of instructions. Branch misprediction is when it makes the wrong prediction. Hence, the previous prediction has to be erased and new one calculated and placed on the execution stack. <!-- col-1 --> | 5 ns <!-- col-2 --> | 10 s <!-- col-3 --> |
| L2 cache reference <!-- col-0 --> | Level-2 cache is memory built on a separate chip. <!-- col-1 --> | 7 ns <!-- col-2 --> | 14 s <!-- col-3 --> |
| Mutex lock/unlock <!-- col-0 --> | Simple synchronization method used to ensure exclusive access to resources shared between many threads. <!-- col-1 --> | 25 ns <!-- col-2 --> | 50 s <!-- col-3 --> |
| Main memory reference <!-- col-0 --> | Time to reference main memory i.e. RAM. <!-- col-1 --> | 100 ns <!-- col-2 --> | 3m 20s <!-- col-3 --> |
| Compress 1K bytes with Snappy <!-- col-0 --> | Snappy is a fast data compression and decompression library written in C++ by Google and used in many Google projects like BigTable, MapReduce and other open source projects. <!-- col-1 --> | 3,000 ns <!-- col-2 --> | 1h 40 m <!-- col-3 --> |
| Send 1K bytes over 1 Gbps network <!-- col-0 --> |  <!-- col-1 --> | 10,000 ns <!-- col-2 --> | 5h 33m 20s <!-- col-3 --> |
| Read 1 MB sequentially from memory <!-- col-0 --> | Read from RAM. <!-- col-1 --> | 250,000 ns <!-- col-2 --> | 5d 18h 53m 20s <!-- col-3 --> |
| Round trip within same datacenter <!-- col-0 --> | We can assume that the DNS lookup will be much faster within a datacenter than it is to go over an external router. <!-- col-1 --> | 500,000 ns <!-- col-2 --> | 11d 13h 46m 40s <!-- col-3 --> |
| Read 1 MB sequentially from SSD disk <!-- col-0 --> | Assumes SSD disk. SSD boasts random data access times of 100000 ns or less. <!-- col-1 --> | 1,000,000 ns <!-- col-2 --> | 23d 3h 33m 20s <!-- col-3 --> |
| Disk seek <!-- col-0 --> | Disk seek is method to get to the sector and head in the disk where the required data exists. <!-- col-1 --> | 10,000,000 ns <!-- col-2 --> | 231d 11h 33m 20s <!-- col-3 --> |
| Read 1 MB sequentially from disk <!-- col-0 --> | Assumes regular disk, not SSD. Check the difference in comparison to SSD! <!-- col-1 --> | 20,000,000 ns <!-- col-2 --> | 462d 23h 6m 40s <!-- col-3 --> |
| Send packet CA->Netherlands->CA <!-- col-0 --> | Round trip for packet data from U.S.A to Europe and back. <!-- col-1 --> | 150,000,000 ns <!-- col-2 --> | 3472d 5h 20m <!-- col-3 --> |

### References:

1. [Designs, Lessons and Advice from Building Large Distributed Systems](http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf)
2. [Peter Norvig‚Äôs post on ‚Äì Teach Yourself Programming in Ten Years](http://norvig.com/21-days.html#answers)

3. Metadata Extraction (Basic):

npx d2m@latest -u https://xkcd.com -meta basic

This command extracts basic metadata (title, description, keywords) and includes it in the Markdown output.

Click to view the output

---
title: "xkcd: Exam Numbers"
---

- [Archive](/archive)
- [What If?](https://what-if.xkcd.com/)
- [About](/about)
- [Feed](/atom.xml)   ‚Ä¢ [Email](/newsletter/)
- [TW](https://twitter.com/xkcd/)   ‚Ä¢ [FB](https://www.facebook.com/TheXKCD/)
  ‚Ä¢ [IG](https://www.instagram.com/xkcd/)
- [-Books-](/books/)
- [What If? 2](/what-if-2/)
- [WI?](/what-if/)   ‚Ä¢ [TE](/thing-explainer/)   ‚Ä¢ [HT](/how-to/)

<a href="/">![xkcd.com logo](/s/0b7742.png)</a> A webcomic of romance,
sarcasm, math, and language. [Special 10th anniversary edition of WHAT IF?](https://xkcd.com/what-if/) ‚Äîrevised and
annotated with brand-new illustrations and answers to important questions you never thought to ask‚Äîcoming from
November 2024. Preorder [here](https://bit.ly/WhatIf10th) ! Exam Numbers

- [|<](/1/)
- [< Prev](/2965/)
- [Random](//c.xkcd.com/random/comic/)
- [Next >](about:blank#)
- [>|](/)

![Exam Numbers](//imgs.xkcd.com/comics/exam_numbers.png)

- [|<](/1/)
- [< Prev](/2965/)
- [Random](//c.xkcd.com/random/comic/)
- [Next >](about:blank#)
- [>|](/)

Permanent link to this comic: [https://xkcd.com/2966/](https://xkcd.com/2966)
Image URL (for
hotlinking/embedding): [https://imgs.xkcd.com/comics/exam_numbers.png](https://imgs.xkcd.com/comics/exam_numbers.png)![Selected Comics](//imgs.xkcd.com/s/a899e84.jpg)
<a href="//xkcd.com/1732/">![Earth temperature timeline](//imgs.xkcd.com/s/temperature.png)</a>
[RSS Feed](/rss.xml) - [Atom Feed](/atom.xml) - [Email](/newsletter/)
Comics I enjoy:
[Three Word Phrase](http://threewordphrase.com/) , [SMBC](https://www.smbc-comics.com/) , [Dinosaur Comics](https://www.qwantz.com/) , [Oglaf](https://oglaf.com/) (
nsfw), [A Softer World](https://www.asofterworld.com/) , [Buttersafe](https://buttersafe.com/) , [Perry Bible Fellowship](https://pbfcomics.com/) , [Questionable Content](https://questionablecontent.net/) , [Buttercup Festival](http://www.buttercupfestival.com/) , [Homestuck](https://www.homestuck.com/) , [Junior Scientist Power Hour](https://www.jspowerhour.com/)
Other things:
[Tips on technology and government](https://medium.com/civic-tech-thoughts-from-joshdata/so-you-want-to-reform-democracy-7f3b1ef10597) ,
[Climate FAQ](https://www.nytimes.com/interactive/2017/climate/what-is-climate-change.html) , [Katharine Hayhoe](https://twitter.com/KHayhoe)
xkcd.com is best viewed with Netscape Navigator 4.0 or below on a Pentium 3¬±1 emulated in Javascript on an Apple IIGS
at a screen resolution of 1024x1. Please enable your ad blockers, disable high-heat drying, and remove your device
from Airplane Mode and set it to Boat Mode. For security reasons, please leave caps lock on while browsing. This work is
licensed under
a [Creative Commons Attribution-NonCommercial 2.5 License](https://creativecommons.org/licenses/by-nc/2.5/).

This means you're free to copy and share these comics (but not to sell them). [More details](/license.html).

4. Metadata Extraction (Extended):

npx d2m@latest -u https://xkcd.com -meta extended

This command extracts extended metadata, including Open Graph, Twitter Card tags, and JSON-LD data, and includes it in the Markdown output.

Click to view the output

---
title: "xkcd: Exam Numbers"
openGraph:
  site_name: "xkcd"
  title: "Exam Numbers"
  url: "https://xkcd.com/2966/"
  image: "https://imgs.xkcd.com/comics/exam_numbers_2x.png"
twitter:
  card: "summary_large_image"
---

- [Archive](/archive)
- [What If?](https://what-if.xkcd.com/)
- [About](/about)
- [Feed](/atom.xml)   ‚Ä¢ [Email](/newsletter/)
- [TW](https://twitter.com/xkcd/)   ‚Ä¢ [FB](https://www.facebook.com/TheXKCD/)
  ‚Ä¢ [IG](https://www.instagram.com/xkcd/)
- [-Books-](/books/)
- [What If? 2](/what-if-2/)
- [WI?](/what-if/)   ‚Ä¢ [TE](/thing-explainer/)   ‚Ä¢ [HT](/how-to/)

<a href="/">![xkcd.com logo](/s/0b7742.png)</a> A webcomic of romance,
sarcasm, math, and language. [Special 10th anniversary edition of WHAT IF?](https://xkcd.com/what-if/) ‚Äîrevised and
annotated with brand-new illustrations and answers to important questions you never thought to ask‚Äîcoming from
November 2024. Preorder [here](https://bit.ly/WhatIf10th) ! Exam Numbers

- [|<](/1/)
- [< Prev](/2965/)
- [Random](//c.xkcd.com/random/comic/)
- [Next >](about:blank#)
- [>|](/)

![Exam Numbers](//imgs.xkcd.com/comics/exam_numbers.png)

- [|<](/1/)
- [< Prev](/2965/)
- [Random](//c.xkcd.com/random/comic/)
- [Next >](about:blank#)
- [>|](/)

Permanent link to this comic: [https://xkcd.com/2966/](https://xkcd.com/2966)
Image URL (for
hotlinking/embedding): [https://imgs.xkcd.com/comics/exam_numbers.png](https://imgs.xkcd.com/comics/exam_numbers.png)![Selected Comics](//imgs.xkcd.com/s/a899e84.jpg)
<a href="//xkcd.com/1732/">![Earth temperature timeline](//imgs.xkcd.com/s/temperature.png)</a>
[RSS Feed](/rss.xml) - [Atom Feed](/atom.xml) - [Email](/newsletter/)
Comics I enjoy:
[Three Word Phrase](http://threewordphrase.com/) , [SMBC](https://www.smbc-comics.com/) , [Dinosaur Comics](https://www.qwantz.com/) , [Oglaf](https://oglaf.com/) (
nsfw), [A Softer World](https://www.asofterworld.com/) , [Buttersafe](https://buttersafe.com/) , [Perry Bible Fellowship](https://pbfcomics.com/) , [Questionable Content](https://questionablecontent.net/) , [Buttercup Festival](http://www.buttercupfestival.com/) , [Homestuck](https://www.homestuck.com/) , [Junior Scientist Power Hour](https://www.jspowerhour.com/)
Other things:
[Tips on technology and government](https://medium.com/civic-tech-thoughts-from-joshdata/so-you-want-to-reform-democracy-7f3b1ef10597) ,
[Climate FAQ](https://www.nytimes.com/interactive/2017/climate/what-is-climate-change.html) , [Katharine Hayhoe](https://twitter.com/KHayhoe)
xkcd.com is best viewed with Netscape Navigator 4.0 or below on a Pentium 3¬±1 emulated in Javascript on an Apple IIGS
at a screen resolution of 1024x1. Please enable your ad blockers, disable high-heat drying, and remove your device
from Airplane Mode and set it to Boat Mode. For security reasons, please leave caps lock on while browsing. This work is
licensed under
a [Creative Commons Attribution-NonCommercial 2.5 License](https://creativecommons.org/licenses/by-nc/2.5/).

This means you're free to copy and share these comics (but not to sell them). [More details](/license.html).

Installation

Using npm

npm install dom-to-semantic-markdown

Using npx (CLI)

> npx d2m@latest -h
Usage: d2m [options]

Convert DOM to Semantic Markdown

Options:
  -V, --version                                      output the version number
  -i, --input <file>                                 Input HTML file
  -o, --output <file>                                Output Markdown file
  -e, --extract-main                                 Extract main content
  -u, --url <url>                                    URL to fetch HTML content from
  -t, --track-table-columns                          Enable table column tracking for improved LLM data correlation
  -meta, --include-meta-data <"basic" | "extended">  Include metadata extracted from the HTML head
  -h, --help                                         display help for command

Usage

Browser

import {convertHtmlToMarkdown} from 'dom-to-semantic-markdown';

const markdown = convertHtmlToMarkdown(document.body);
console.log(markdown);

Node.js

import {convertHtmlToMarkdown} from 'dom-to-semantic-markdown';
import {JSDOM} from 'jsdom';

const html = '<h1>Hello, World!</h1><p>This is a <strong>test</strong>.</p>';
const dom = new JSDOM(html);
const markdown = convertHtmlToMarkdown(html, {overrideDOMParser: new dom.window.DOMParser()});
console.log(markdown);

CLI

d2m -i input.html -o output.md # Convert input.html to output.md
d2m -u https://example.com -o output.md # Fetch and convert a webpage to Markdown
d2m -i input.html -e # Extract main content from input.html
d2m -i input.html -t # Enable table column tracking
d2m -i input.html -meta basic # Include basic metadata
d2m -i input.html -meta extended # Include extended metadata

API

`convertHtmlToMarkdown(html: string, options?: ConversionOptions): string`

Converts an HTML string to semantic Markdown.

`convertElementToMarkdown(element: Element, options?: ConversionOptions): string`

Converts an HTML Element to semantic Markdown.

`ConversionOptions`

websiteDomain?: string: The domain of the website being converted.
extractMainContent?: boolean: Whether to extract only the main content of the page.
refifyUrls?: boolean: Whether to convert URLs to reference-style links.
debug?: boolean: Enable debug logging.
overrideDOMParser?: DOMParser: Custom DOMParser for Node.js environments.
enableTableColumnTracking?: boolean: Adds unique identifiers to table columns.
overrideElementProcessing?: (element: Element, options: ConversionOptions, indentLevel: number) => SemanticMarkdownAST[] | undefined: Custom processing for HTML elements.
processUnhandledElement?: (element: Element, options: ConversionOptions, indentLevel: number) => SemanticMarkdownAST[] | undefined: Handler for unknown HTML elements.
overrideNodeRenderer?: (node: SemanticMarkdownAST, options: ConversionOptions, indentLevel: number) => string | undefined: Custom renderer for AST nodes.
renderCustomNode?: (node: CustomNode, options: ConversionOptions, indentLevel: number) => string | undefined: Renderer for custom AST nodes.
includeMetaData?: 'basic' | 'extended': Controls whether to include metadata extracted from the HTML head.
- 'basic': Includes standard meta tags like title, description, and keywords.
- 'extended': Includes basic meta tags, Open Graph tags, Twitter Card tags, and JSON-LD data.

Using the Output with LLMs

The semantic Markdown produced by this library is optimized for use with Large Language Models (LLMs). To use it effectively:

Extract the Markdown content using the library.
Start with a brief instruction or context for the LLM.
Wrap the extracted Markdown in triple backticks (```).
Follow the Markdown with your question or prompt.

Example:

The following is a semantic Markdown representation of a webpage. Please analyze its content:

```markdown
{paste your extracted markdown here}
```

{your question, e.g., "What are the main points discussed in this article?"}

This format helps the LLM understand its task and the context of the content, enabling more accurate and relevant responses to your questions.

Contributing

Contributions are welcome! See the CONTRIBUTING.md file for details.

License

This project is licensed under the MIT License. See the LICENSE file for details.

For Tasks:

Click tags to check more tools for each tasks

extract content analyze web pages understand rich media audit seo content transform content

For Jobs:

content analyst seo specialist web developer data scientist ai researcher

Alternative AI tools for dom-to-semantic-markdown

Similar Open Source Tools

dom-to-semantic-markdown

github

: 708

aimeos

Aimeos is a full-featured e-commerce platform that is ultra-fast, cloud-native, and API-first. It offers a wide range of features including JSON REST API, GraphQL API, multi-vendor support, various product types, subscriptions, multiple payment gateways, admin backend, modular structure, SEO optimization, multi-language support, AI-based text translation, mobile optimization, and high-quality source code. It is highly configurable and extensible, making it suitable for e-commerce SaaS solutions, marketplaces, and various cloud environments. Aimeos is designed for scalability, security, and performance, catering to a diverse range of e-commerce needs.

github

: 4.5k

rig

Rig is a Rust library designed for building scalable, modular, and user-friendly applications powered by large language models (LLMs). It provides full support for LLM completion and embedding workflows, offers simple yet powerful abstractions for LLM providers like OpenAI and Cohere, as well as vector stores such as MongoDB and in-memory storage. With Rig, users can easily integrate LLMs into their applications with minimal boilerplate code.

github

: 3.4k

amica

Amica is an application that allows you to easily converse with 3D characters in your browser. You can import VRM files, adjust the voice to fit the character, and generate response text that includes emotional expressions.

github

: 879

Qmedia

QMedia is an open-source multimedia AI content search engine designed specifically for content creators. It provides rich information extraction methods for text, image, and short video content. The tool integrates unstructured text, image, and short video information to build a multimodal RAG content Q&A system. Users can efficiently search for image/text and short video materials, analyze content, provide content sources, and generate customized search results based on user interests and needs. QMedia supports local deployment for offline content search and Q&A for private data. The tool offers features like content cards display, multimodal content RAG search, and pure local multimodal models deployment. Users can deploy different types of models locally, manage language models, feature embedding models, image models, and video models. QMedia aims to spark new ideas for content creation and share AI content creation concepts in an open-source manner.

github

: 537

cocoindex

CocoIndex is the world's first open-source engine that supports both custom transformation logic and incremental updates specialized for data indexing. Users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes. It provides a Python library for data indexing with features like text embedding, code embedding, PDF parsing, and more. The tool is designed to simplify the process of indexing data for semantic search and structured information extraction.

github

: 483

PromptClip

PromptClip is a tool that allows developers to create video clips using LLM prompts. Users can upload videos from various sources, prompt the video in natural language, use different LLM models, instantly watch the generated clips, finetune the clips, and add music or image overlays. The tool provides a seamless way to extract specific moments from videos based on user queries, making video editing and content creation more efficient and intuitive.

github

: 100

rss-can

RSS Can is a tool designed to simplify and improve RSS feed management. It supports various systems and architectures, including Linux and macOS. Users can download the binary from the GitHub release page or use the Docker image for easy deployment. The tool provides CLI parameters and environment variables for customization. It offers features such as memory and Redis cache services, web service configuration, and rule directory settings. The project aims to support RSS pipeline flow, NLP tasks, integration with open-source software rules, and tools like a quick RSS rules generator.

github

: 61

TempCompass

TempCompass is a benchmark designed to evaluate the temporal perception ability of Video LLMs. It encompasses a diverse set of temporal aspects and task formats to comprehensively assess the capability of Video LLMs in understanding videos. The benchmark includes conflicting videos to prevent models from relying on single-frame bias and language priors. Users can clone the repository, install required packages, prepare data, run inference using examples like Video-LLaVA and Gemini, and evaluate the performance of their models across different tasks such as Multi-Choice QA, Yes/No QA, Caption Matching, and Caption Generation.

github

: 71

llm-interface

LLM Interface is an npm module that streamlines interactions with various Large Language Model (LLM) providers in Node.js applications. It offers a unified interface for switching between providers and models, supporting 36 providers and hundreds of models. Features include chat completion, streaming, error handling, extensibility, response caching, retries, JSON output, and repair. The package relies on npm packages like axios, @google/generative-ai, dotenv, jsonrepair, and loglevel. Installation is done via npm, and usage involves sending prompts to LLM providers. Tests can be run using npm test. Contributions are welcome under the MIT License.

github

: 92

Torch-Pruning

Torch-Pruning (TP) is a library for structural pruning that enables pruning for a wide range of deep neural networks. It uses an algorithm called DepGraph to physically remove parameters. The library supports pruning off-the-shelf models from various frameworks and provides benchmarks for reproducing results. It offers high-level pruners, dependency graph for automatic pruning, low-level pruning functions, and supports various importance criteria and modules. Torch-Pruning is compatible with both PyTorch 1.x and 2.x versions.

github

: 2.6k

gptel

GPTel is a simple Large Language Model chat client for Emacs, with support for multiple models and backends. It's async and fast, streams responses, and interacts with LLMs from anywhere in Emacs. LLM responses are in Markdown or Org markup. Supports conversations and multiple independent sessions. Chats can be saved as regular Markdown/Org/Text files and resumed later. You can go back and edit your previous prompts or LLM responses when continuing a conversation. These will be fed back to the model. Don't like gptel's workflow? Use it to create your own for any supported model/backend with a simple API.

github

: 2.2k

ScaleLLM

ScaleLLM is a cutting-edge inference system engineered for large language models (LLMs), meticulously designed to meet the demands of production environments. It extends its support to a wide range of popular open-source models, including Llama3, Gemma, Bloom, GPT-NeoX, and more. ScaleLLM is currently undergoing active development. We are fully committed to consistently enhancing its efficiency while also incorporating additional features. Feel free to explore our **_Roadmap_** for more details. ## Key Features * High Efficiency: Excels in high-performance LLM inference, leveraging state-of-the-art techniques and technologies like Flash Attention, Paged Attention, Continuous batching, and more. * Tensor Parallelism: Utilizes tensor parallelism for efficient model execution. * OpenAI-compatible API: An efficient golang rest api server that compatible with OpenAI. * Huggingface models: Seamless integration with most popular HF models, supporting safetensors. * Customizable: Offers flexibility for customization to meet your specific needs, and provides an easy way to add new models. * Production Ready: Engineered with production environments in mind, ScaleLLM is equipped with robust system monitoring and management features to ensure a seamless deployment experience.

github

: 418

superagentx

SuperAgentX is a lightweight open-source AI framework designed for multi-agent applications with Artificial General Intelligence (AGI) capabilities. It offers goal-oriented multi-agents with retry mechanisms, easy deployment through WebSocket, RESTful API, and IO console interfaces, streamlined architecture with no major dependencies, contextual memory using SQL + Vector databases, flexible LLM configuration supporting various Gen AI models, and extendable handlers for integration with diverse APIs and data sources. It aims to accelerate the development of AGI by providing a powerful platform for building autonomous AI agents capable of executing complex tasks with minimal human intervention.

github

: 57

matmulfreellm

MatMul-Free LM is a language model architecture that eliminates the need for Matrix Multiplication (MatMul) operations. This repository provides an implementation of MatMul-Free LM that is compatible with the 🤗 Transformers library. It evaluates how the scaling law fits to different parameter models and compares the efficiency of the architecture in leveraging additional compute to improve performance. The repo includes pre-trained models, model implementations compatible with 🤗 Transformers library, and generation examples for text using the 🤗 text generation APIs.

github

: 2.8k

screenpipe

24/7 Screen & Audio Capture Library to build personalized AI powered by what you've seen, said, or heard. Works with Ollama. Alternative to Rewind.ai. Open. Secure. You own your data. Rust. We are shipping daily, make suggestions, post bugs, give feedback. Building a reliable stream of audio and screenshot data, simplifying life for developers by solving non-trivial problems. Multiple installation options available. Experimental tool with various integrations and features for screen and audio capture, OCR, STT, and more. Open source project focused on enabling tooling & infrastructure for a wide range of applications.

github

: 12.9k

For similar tasks

dom-to-semantic-markdown

github

: 708

1filellm

1filellm is a command-line data aggregation tool designed for LLM ingestion. It aggregates and preprocesses data from various sources into a single text file, facilitating the creation of information-dense prompts for large language models. The tool supports automatic source type detection, handling of multiple file formats, web crawling functionality, integration with Sci-Hub for research paper downloads, text preprocessing, and token count reporting. Users can input local files, directories, GitHub repositories, pull requests, issues, ArXiv papers, YouTube transcripts, web pages, Sci-Hub papers via DOI or PMID. The tool provides uncompressed and compressed text outputs, with the uncompressed text automatically copied to the clipboard for easy pasting into LLMs.

github

: 292

AudioNotes

AudioNotes is a system built on FunASR and Qwen2 that can quickly extract content from audio and video, and organize it using large models into structured markdown notes for easy reading. Users can interact with the audio and video content, install Ollama, pull models, and deploy services using Docker or locally with a PostgreSQL database. The system provides a seamless way to convert audio and video into structured notes for efficient consumption.

github

: 102

scrape-it-now

Scrape It Now is a versatile tool for scraping websites with features like decoupled architecture, CLI functionality, idempotent operations, and content storage options. The tool includes a scraper component for efficient scraping, ad blocking, link detection, markdown extraction, dynamic content loading, and anonymity features. It also offers an indexer component for creating AI search indexes, chunking content, embedding chunks, and enabling semantic search. The tool supports various configurations for Azure services and local storage, providing flexibility and scalability for web scraping and indexing tasks.

github

: 452

open-deep-research

Open Deep Research is an open-source tool designed to generate AI-powered reports from web search results efficiently. It combines Bing Search API for search results retrieval, JinaAI for content extraction, and customizable report generation. Users can customize settings, export reports in multiple formats, and benefit from rate limiting for stability. The tool aims to streamline research and report creation in a user-friendly platform.

github

: 231

DevDocs

DevDocs is a platform designed to simplify the process of digesting technical documentation for software engineers and developers. It automates the extraction and conversion of web content into markdown format, making it easier for users to access and understand the information. By crawling through child pages of a given URL, DevDocs provides a streamlined approach to gathering relevant data and integrating it into various tools for software development. The tool aims to save time and effort by eliminating the need for manual research and content extraction, ultimately enhancing productivity and efficiency in the development process.

github

: 469

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675