Matryoshka
MCP server for token-efficient large document analysis via the use of REPL state
Stars: 105
Matryoshka is a tool that processes documents 100x larger than your LLM's context window without vector databases or chunking heuristics. It uses Recursive Language Models to reason about queries and output symbolic commands executed by a logic engine. The tool provides a constrained symbolic language called Nucleus based on S-expressions, ensuring reduced entropy, fail-fast validation, safe execution, and small model friendliness. It includes components like the Nucleus DSL, Lattice Engine, In-Memory Handle Storage, and the role of the LLM in reasoning. Matryoshka offers CLI tools for document analysis, MCP integration for token savings, and programmatic access. It supports symbol operations, collection operations, string operations, type coercion, program synthesis, cross-turn state, and final answer formatting.
README:
Process documents 100x larger than your LLM's context window—without vector databases or chunking heuristics.
LLMs have fixed context windows. Traditional solutions (RAG, chunking) lose information or miss connections across chunks. RLM takes a different approach: the model reasons about your query and outputs symbolic commands that a logic engine executes against the document.
Based on the Recursive Language Models paper.
Unlike traditional approaches where an LLM writes arbitrary code, RLM uses Nucleus—a constrained symbolic language based on S-expressions. The LLM outputs Nucleus commands, which are parsed, type-checked, and executed by Lattice, our logic engine.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ User Query │────▶│ LLM Reasons │────▶│ Nucleus Command │
│ "total sales?" │ │ about intent │ │ (sum RESULTS) │
└─────────────────┘ └─────────────────┘ └────────┬────────┘
│
┌─────────────────┐ ┌─────────────────┐ ┌────────▼────────┐
│ Final Answer │◀────│ Lattice Engine │◀────│ Parser │
│ 13,000,000 │ │ Executes │ │ Validates │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Why this works better than code generation:
- Reduced entropy - Nucleus has a rigid grammar with fewer valid outputs than JavaScript
- Fail-fast validation - Parser rejects malformed commands before execution
- Safe execution - Lattice only executes known operations, no arbitrary code
- Small model friendly - 7B models handle symbolic grammars better than freeform code
The LLM outputs commands in the Nucleus DSL—an S-expression language designed for document analysis:
; Search for patterns
(grep "SALES_DATA")
; Filter results
(filter RESULTS (lambda x (match x "NORTH" 0)))
; Aggregate
(sum RESULTS) ; Auto-extracts numbers like "$2,340,000" from lines
(count RESULTS) ; Count matching items
; Final answer
<<<FINAL>>>13000000<<<END>>>The Lattice engine (src/logic/) processes Nucleus commands:
-
Parser (
lc-parser.ts) - Parses S-expressions into an AST -
Type Inference (
type-inference.ts) - Validates types before execution -
Constraint Resolver (
constraint-resolver.ts) - Handles symbolic constraints like[Σ⚡μ] -
Solver (
lc-solver.ts) - Executes commands against the document
Lattice uses miniKanren (a relational programming engine) for pattern classification and filtering operations.
For large result sets, RLM uses a handle-based architecture with in-memory SQLite (src/persistence/) that achieves 97%+ token savings:
Traditional: LLM sees full array [15,000 tokens for 1000 results]
Handle-based: LLM sees stub [50 tokens: "$res1: Array(1000) [preview...]"]
How it works:
- Results are stored in SQLite with FTS5 full-text indexing
- LLM receives only handle references (
$res1,$res2, etc.) - Operations execute server-side, returning new handles
- Full data is only materialized when needed
Components:
-
SessionDB- In-memory SQLite with FTS5 for fast full-text search -
HandleRegistry- Stores arrays, returns compact handle references -
HandleOps- Server-side filter/map/count/sum on handles -
FTS5Search- Phrase queries, boolean operators, relevance ranking -
CheckpointManager- Save/restore session state
The LLM does reasoning, not code generation:
- Understands intent - Interprets "total of north sales" as needing grep + filter + sum
- Chooses operations - Decides which Nucleus commands achieve the goal
- Verifies results - Checks if the current results answer the query
- Iterates - Refines search if results are too broad or narrow
The LLM never writes JavaScript. It outputs Nucleus commands that Lattice executes safely.
| Component | Purpose |
|---|---|
| Nucleus Adapter | Prompts LLM to output Nucleus commands |
| Lattice Parser | Parses S-expressions to AST |
| Lattice Solver | Executes commands against document |
| In-Memory Handles | Handle-based storage with FTS5 (97% token savings) |
| miniKanren | Relational engine for classification |
| RAG Hints | Few-shot examples from past successes |
Install from npm:
npm install -g matryoshka-rlmOr run without installing:
npx matryoshka-rlm "What is the total of all sales values?" ./report.txtThe package provides several CLI tools:
| Command | Description |
|---|---|
rlm |
Main CLI for document analysis with LLM reasoning |
lattice-mcp |
MCP server exposing direct Nucleus commands (no LLM required) |
lattice-repl |
Interactive REPL for Nucleus commands |
lattice-http |
HTTP server for Nucleus queries |
lattice-pipe |
Pipe adapter for programmatic access |
lattice-setup |
Setup script for Claude Code integration |
git clone https://github.com/yogthos/Matryoshka.git
cd Matryoshka
npm install
npm run buildCopy config.example.json to config.json and configure your LLM provider:
{
"llm": {
"provider": "ollama"
},
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434",
"model": "qwen2.5-coder:7b",
"options": { "temperature": 0.2, "num_ctx": 8192 }
},
"deepseek": {
"baseUrl": "https://api.deepseek.com",
"apiKey": "${DEEPSEEK_API_KEY}",
"model": "deepseek-chat",
"options": { "temperature": 0.2 }
}
}
}# Basic usage
rlm "What is the total of all sales values?" ./report.txt
# With options
rlm "Count all ERROR entries" ./logs.txt --max-turns 15 --verbose
# See all options
rlm --helpRLM includes lattice-mcp, an MCP (Model Context Protocol) server for direct access to the Nucleus engine. This allows coding agents to analyze documents with 80%+ token savings compared to reading files directly.
The key advantage is handle-based results: query results are stored server-side in SQLite, and the agent receives compact stubs like $res1: Array(1000) [preview...] instead of full data. Operations chain server-side without roundtripping data.
| Tool | Description |
|---|---|
lattice_load |
Load a document for analysis |
lattice_query |
Execute Nucleus commands on the loaded document |
lattice_expand |
Expand a handle to see full data (with optional limit/offset) |
lattice_close |
Close the session and free memory |
lattice_status |
Get session status and document info |
lattice_bindings |
Show current variable bindings |
lattice_reset |
Reset bindings but keep document loaded |
lattice_help |
Get Nucleus command reference |
{
"mcp": {
"lattice": {
"type": "stdio",
"command": "lattice-mcp"
}
}
}1. lattice_load("/path/to/large-file.txt") # Load document (use for >500 lines)
2. lattice_query('(grep "ERROR")') # Search - returns handle stub $res1
3. lattice_query('(filter RESULTS ...)') # Narrow down - returns handle stub $res2
4. lattice_query('(count RESULTS)') # Get count without seeing data
5. lattice_expand("$res2", limit=10) # Expand only what you need to see
6. lattice_close() # Free memory when done
Token efficiency tips:
- Query results return handle stubs, not full data
- Use
lattice_expandwithlimitto see only what you need - Chain
grep → filter → count/sumto refine progressively - Use
RESULTSin queries (always points to last result) - Use
$res1,$res2etc. withlattice_expandto inspect specific results
import { runRLM } from "matryoshka-rlm/rlm";
import { createLLMClient } from "matryoshka-rlm";
const llmClient = createLLMClient("ollama", {
baseUrl: "http://localhost:11434",
model: "qwen2.5-coder:7b",
options: { temperature: 0.2 }
});
const result = await runRLM("What is the total of all sales values?", "./report.txt", {
llmClient,
maxTurns: 10,
turnTimeoutMs: 30000,
});$ rlm "What is the total of all north sales data values?" ./report.txt --verbose
──────────────────────────────────────────────────
[Turn 1/10] Querying LLM...
[Turn 1] Term: (grep "SALES.*NORTH")
[Turn 1] Result: 1 matches
──────────────────────────────────────────────────
[Turn 2/10] Querying LLM...
[Turn 2] Term: (sum RESULTS)
[Turn 2] Console output:
[Lattice] Summing 1 values
[Lattice] Sum = 2340000
[Turn 2] Result: 2340000
──────────────────────────────────────────────────
[Turn 3/10] Querying LLM...
[Turn 3] Final answer received
2340000
The model:
- Searched for relevant data with grep
- Summed the matching results
- Output the final answer
(grep "pattern") ; Regex search, returns matches with line numbers
(fuzzy_search "query" 10) ; Fuzzy search, returns top N matches with scores
(text_stats) ; Document metadata (length, line count, samples)For code files, Lattice uses tree-sitter to extract structural symbols. This enables code-aware queries that understand functions, classes, methods, and other language constructs.
Built-in languages (packages included):
- TypeScript (.ts, .tsx), JavaScript (.js, .jsx), Python (.py), Go (.go)
- HTML (.html), CSS (.css), JSON (.json)
Additional languages (install package to enable):
- Rust, C, C++, Java, Ruby, PHP, C#, Kotlin, Swift, Scala, Lua, Haskell, Bash, SQL, and more
(list_symbols) ; List all symbols (functions, classes, methods, etc.)
(list_symbols "function") ; Filter by kind: "function", "class", "method", "interface", "type", "struct"
(get_symbol_body "myFunc") ; Get source code body for a symbol by name
(get_symbol_body RESULTS) ; Get body for symbol from previous query result
(find_references "myFunc") ; Find all references to an identifierExample workflow for code analysis:
1. lattice_load("./src/app.ts") # Load a code file
2. lattice_query('(list_symbols)') # Get all symbols → $res1
3. lattice_query('(list_symbols "function")') # Just functions → $res2
4. lattice_expand("$res2", limit=5) # See function names and line numbers
5. lattice_query('(get_symbol_body "handleRequest")') # Get function body
6. lattice_query('(find_references "handleRequest")') # Find all usages
Symbols include metadata like name, kind, start/end lines, and parent relationships (e.g., methods within classes).
Matryoshka includes built-in symbol mappings for 20+ languages. To enable a language, install its tree-sitter grammar package:
# Enable Rust support
npm install tree-sitter-rust
# Enable Java support
npm install tree-sitter-java
# Enable Ruby support
npm install tree-sitter-rubyLanguages with built-in mappings:
- TypeScript, JavaScript, Python, Go, Rust, C, C++, Java
- Ruby, PHP, C#, Kotlin, Swift, Scala, Lua, Haskell, Elixir
- HTML, CSS, JSON, YAML, TOML, Markdown, SQL, Bash
Once a package is installed, the language is automatically available for symbol extraction.
For languages without built-in mappings, or to override existing mappings, create a config file at ~/.matryoshka/config.json:
{
"grammars": {
"mylang": {
"package": "tree-sitter-mylang",
"extensions": [".ml", ".mli"],
"moduleExport": "mylang",
"symbols": {
"function_definition": "function",
"method_definition": "method",
"class_definition": "class",
"module_definition": "module"
}
}
}
}Configuration fields:
| Field | Required | Description |
|---|---|---|
package |
Yes | npm package name for the tree-sitter grammar |
extensions |
Yes | File extensions to associate with this language |
symbols |
Yes | Maps tree-sitter node types to symbol kinds |
moduleExport |
No | Submodule export name (e.g., "typescript" for tree-sitter-typescript) |
Symbol kinds: function, method, class, interface, type, struct, enum, trait, module, variable, constant, property
To configure symbol mappings for a new language, you need to know the tree-sitter node types. You can explore them using the tree-sitter CLI:
# Install tree-sitter CLI
npm install -g tree-sitter-cli
# Parse a sample file and see the AST
tree-sitter parse sample.mylangOr use the tree-sitter playground to explore node types interactively.
Example: Adding OCaml support
- Find the grammar package:
tree-sitter-ocaml - Install it:
npm install tree-sitter-ocaml - Explore the AST to find node types for functions, modules, etc.
- Add to
~/.matryoshka/config.json:
{
"grammars": {
"ocaml": {
"package": "tree-sitter-ocaml",
"extensions": [".ml", ".mli"],
"moduleExport": "ocaml",
"symbols": {
"value_definition": "function",
"let_binding": "variable",
"type_definition": "type",
"module_definition": "module",
"module_type_definition": "interface"
}
}
}
}Note: Some tree-sitter packages use native Node.js bindings that may not compile on all systems. If installation fails, check if the package supports your Node.js version or look for WASM alternatives.
(filter RESULTS (lambda x (match x "pattern" 0))) ; Filter by regex
(map RESULTS (lambda x (match x "(\\d+)" 1))) ; Extract from each
(sum RESULTS) ; Sum numbers in results
(count RESULTS) ; Count items(match str "pattern" 0) ; Regex match, return group N
(replace str "from" "to") ; String replacement
(split str "," 0) ; Split and get index
(parseInt str) ; Parse integer
(parseFloat str) ; Parse floatWhen the model sees data that needs parsing, it can use declarative type coercion:
; Date parsing (returns ISO format YYYY-MM-DD)
(parseDate "Jan 15, 2024") ; -> "2024-01-15"
(parseDate "01/15/2024" "US") ; -> "2024-01-15" (MM/DD/YYYY)
(parseDate "15/01/2024" "EU") ; -> "2024-01-15" (DD/MM/YYYY)
; Currency parsing (handles $, €, commas, etc.)
(parseCurrency "$1,234.56") ; -> 1234.56
(parseCurrency "€1.234,56") ; -> 1234.56 (EU format)
; Number parsing
(parseNumber "1,234,567") ; -> 1234567
(parseNumber "50%") ; -> 0.5
; General coercion
(coerce value "date") ; Coerce to date
(coerce value "currency") ; Coerce to currency
(coerce value "number") ; Coerce to number
; Extract and coerce in one step
(extract str "\\$[\\d,]+" 0 "currency") ; Extract and parse as currencyUse in map for batch transformations:
; Parse all dates in results
(map RESULTS (lambda x (parseDate (match x "[A-Za-z]+ \\d+, \\d+" 0))))
; Extract and sum currencies
(map RESULTS (lambda x (parseCurrency (match x "\\$[\\d,]+" 0))))For complex transformations, the model can synthesize functions from examples:
; Synthesize from input/output pairs
(synthesize
("$100" 100)
("$1,234" 1234)
("$50,000" 50000))
; -> Returns a function that extracts numbers from currency stringsThis uses Barliman-style relational synthesis with miniKanren to automatically build extraction functions.
Results from previous turns are available:
-
RESULTS- Latest array result (updated by grep, filter) -
_0,_1,_2, ... - Results from specific turns
<<<FINAL>>>your answer here<<<END>>>Symptom: The model provides an answer immediately with hallucinated data.
Solutions:
- Use a more capable model (7B+ recommended)
- Be specific in your query: "Find lines containing SALES_DATA and sum the dollar amounts"
Symptom: "Max turns (N) reached without final answer"
Solutions:
- Increase
--max-turnsfor complex documents - Check
--verboseoutput for repeated patterns (model stuck in loop) - Simplify the query
Symptom: "Parse error: no valid command"
Cause: Model output malformed S-expression.
Solutions:
- The system auto-converts JSON to S-expressions as fallback
- Use
--verboseto see what the model is generating - Try a different model tuned for code/symbolic output
npm test # Run tests
npm test -- --coverage # With coverage
RUN_E2E=1 npm test -- tests/e2e.test.ts # E2E tests (requires Ollama)
npm run build # Build
npm run typecheck # Type checksrc/
├── adapters/ # Model-specific prompting
│ ├── nucleus.ts # Nucleus DSL adapter
│ └── types.ts # Adapter interface
├── logic/ # Lattice engine
│ ├── lc-parser.ts # Nucleus parser
│ ├── lc-solver.ts # Command executor (uses miniKanren)
│ ├── type-inference.ts
│ └── constraint-resolver.ts
├── persistence/ # In-memory handle storage (97% token savings)
│ ├── session-db.ts # In-memory SQLite with FTS5
│ ├── handle-registry.ts # Handle creation and stubs
│ ├── handle-ops.ts # Server-side operations
│ ├── fts5-search.ts # Full-text search
│ └── checkpoint.ts # Session persistence
├── treesitter/ # Code-aware symbol extraction
│ ├── parser-registry.ts # Tree-sitter parser management
│ ├── symbol-extractor.ts # AST → symbol extraction
│ ├── language-map.ts # Extension → language mapping
│ └── types.ts # Symbol interfaces
├── engine/ # Nucleus execution engine
│ ├── nucleus-engine.ts
│ └── handle-session.ts # Session with symbol support
├── minikanren/ # Relational programming engine
├── synthesis/ # Program synthesis (Barliman-style)
│ └── evalo/ # Extractor DSL
├── rag/ # Few-shot hint retrieval
└── rlm.ts # Main execution loop
This project incorporates ideas and code from:
- Nucleus - A symbolic S-expression language by Michael Whitford. RLM uses Nucleus syntax for the constrained DSL that the LLM outputs, providing a rigid grammar that reduces model errors.
- ramo - A miniKanren implementation in TypeScript by Will Lewis. Used for constraint-based program synthesis.
- Barliman - A prototype smart editor by William Byrd and Greg Rosenblatt that uses program synthesis to assist programmers. The Barliman-style approach of providing input/output constraints instead of code inspired the synthesis workflow.
- tree-sitter - A parser generator tool and incremental parsing library. Used for extracting structural symbols (functions, classes, methods) from code files to enable code-aware queries.
MIT
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Matryoshka
Similar Open Source Tools
Matryoshka
Matryoshka is a tool that processes documents 100x larger than your LLM's context window without vector databases or chunking heuristics. It uses Recursive Language Models to reason about queries and output symbolic commands executed by a logic engine. The tool provides a constrained symbolic language called Nucleus based on S-expressions, ensuring reduced entropy, fail-fast validation, safe execution, and small model friendliness. It includes components like the Nucleus DSL, Lattice Engine, In-Memory Handle Storage, and the role of the LLM in reasoning. Matryoshka offers CLI tools for document analysis, MCP integration for token savings, and programmatic access. It supports symbol operations, collection operations, string operations, type coercion, program synthesis, cross-turn state, and final answer formatting.
simili-bot
Simili Bot is an AI-powered tool designed for GitHub repositories to automatically detect duplicate issues, find similar issues using semantic search, and intelligently route issues across repositories. It offers features such as semantic duplicate detection, cross-repository search, intelligent routing, smart triage, modular pipeline customization, and multi-repo support. The tool follows a 'Lego with Blueprints' architecture, with Lego Blocks representing independent pipeline steps and Blueprints providing pre-defined workflows. Users can configure AI providers like Gemini and OpenAI, set default models for embeddings, and specify workflows in a 'simili.yaml' file. Simili Bot also offers CLI commands for bulk indexing, processing single issues, and batch operations, enabling local development, testing, and analysis of historical data.
mxcp
MXCP is an enterprise-grade MCP framework for building production-ready AI applications. It provides a structured methodology for data modeling, service design, smart implementation, quality assurance, and production operations. With built-in enterprise features like security, audit trail, type safety, testing framework, performance optimization, and drift detection, MXCP ensures comprehensive security, quality, and operations. The tool supports SQL for data queries and Python for complex logic, ML models, and integrations, allowing users to choose the right tool for each job while maintaining security and governance. MXCP's architecture includes LLM client, MXCP framework, implementations, security & policies, SQL endpoints, Python tools, type system, audit engine, validation & tests, data sources, and APIs. The tool enforces an organized project structure and offers CLI commands for initialization, quality assurance, data management, operations & monitoring, and LLM integration. MXCP is compatible with Claude Desktop, OpenAI-compatible tools, and custom integrations through the Model Context Protocol (MCP) specification. The tool is developed by RAW Labs for production data-to-AI workflows and is released under the Business Source License 1.1 (BSL), with commercial licensing required for certain production scenarios.
mcp-fusion
MCP Fusion is a Model-View-Agent framework for the Model Context Protocol, providing structured perception for AI agents with validated data, domain rules, UI blocks, and action affordances in every response. It introduces the MVA pattern, where a Presenter layer sits between data and the AI agent, ensuring consistent, validated, contextually-rich data across the API surface. The tool facilitates schema validation, system rules, UI blocks, cognitive guardrails, and action affordances for domain entities. It offers tools for defining actions, prompts, middleware, error handling, type-safe clients, observability, streaming progress, and more, all integrated with the Model Context Protocol SDK and Zod for type safety and validation.
mcp-debugger
mcp-debugger is a Model Context Protocol (MCP) server that provides debugging tools as structured API calls. It enables AI agents to perform step-through debugging of multiple programming languages using the Debug Adapter Protocol (DAP). The tool supports multi-language debugging with clean adapter patterns, including Python debugging via debugpy, JavaScript (Node.js) debugging via js-debug, and Rust debugging via CodeLLDB. It offers features like mock adapter for testing, STDIO and SSE transport modes, zero-runtime dependencies, Docker and npm packages for deployment, structured JSON responses for easy parsing, path validation to prevent crashes, and AI-aware line context for intelligent breakpoint placement with code context.
kiss_ai
KISS AI is a lightweight and powerful multi-agent evolutionary framework that simplifies building AI agents. It uses native function calling for efficiency and accuracy, making building AI agents as straightforward as possible. The framework includes features like multi-agent orchestration, agent evolution and optimization, relentless coding agent for long-running tasks, output formatting, trajectory saving and visualization, GEPA for prompt optimization, KISSEvolve for algorithm discovery, self-evolving multi-agent, Docker integration, multiprocessing support, and support for various models from OpenAI, Anthropic, Gemini, Together AI, and OpenRouter.
DeepMCPAgent
DeepMCPAgent is a model-agnostic tool that enables the creation of LangChain/LangGraph agents powered by MCP tools over HTTP/SSE. It allows for dynamic discovery of tools, connection to remote MCP servers, and integration with any LangChain chat model instance. The tool provides a deep agent loop for enhanced functionality and supports typed tool arguments for validated calls. DeepMCPAgent emphasizes the importance of MCP-first approach, where agents dynamically discover and call tools rather than hardcoding them.
agentboard
Agentboard is a Web GUI for tmux optimized for agent TUI's like claude and codex. It provides a shared workspace across devices with features such as paste support, touch scrolling, virtual arrow keys, log tracking, and session pinning. Users can interact with tmux sessions from any device through a live terminal stream. The tool allows session discovery, status inference, and terminal I/O streaming for efficient agent management.
botserver
General Bots is a self-hosted AI automation platform and LLM conversational platform focused on convention over configuration and code-less approaches. It serves as the core API server handling LLM orchestration, business logic, database operations, and multi-channel communication. The platform offers features like multi-vendor LLM API, MCP + LLM Tools Generation, Semantic Caching, Web Automation Engine, Enterprise Data Connectors, and Git-like Version Control. It enforces a ZERO TOLERANCE POLICY for code quality and security, with strict guidelines for error handling, performance optimization, and code patterns. The project structure includes modules for core functionalities like Rhai BASIC interpreter, security, shared types, tasks, auto task system, file operations, learning system, and LLM assistance.
one
ONE is a modern web and AI agent development toolkit that empowers developers to build AI-powered applications with high performance, beautiful UI, AI integration, responsive design, type safety, and great developer experience. It is perfect for building modern web applications, from simple landing pages to complex AI-powered platforms.
claude-container
Claude Container is a Docker container pre-installed with Claude Code, providing an isolated environment for running Claude Code with optional API request logging in a local SQLite database. It includes three images: main container with Claude Code CLI, optional HTTP proxy for logging requests, and a web UI for visualizing and querying logs. The tool offers compatibility with different versions of Claude Code, quick start guides using a helper script or Docker Compose, authentication process, integration with existing projects, API request logging proxy setup, and data visualization with Datasette.
simple-data-analysis
Simple data analysis (SDA) is an easy-to-use and high-performance TypeScript library for data analysis. It can be used with tabular and geospatial data. The library is maintained by Nael Shiab, a computational journalist and senior data producer for CBC News. SDA is based on DuckDB, a fast in-process analytical database, and it sends SQL queries to be executed by DuckDB. The library provides methods inspired by Pandas (Python) and the Tidyverse (R), and it also supports writing custom SQL queries and processing data with JavaScript. Additionally, SDA offers methods for leveraging large language models (LLMs) for data cleaning, extraction, categorization, and natural language interaction, as well as for embeddings and semantic search.
SG-Nav
SG-Nav is an online 3D scene graph prompting tool designed for LLM-based zero-shot object navigation. It proposes a framework that constructs an online 3D scene graph to prompt LLMs, allowing direct application to various scenes and categories without the need for training.
Veritensor
Veritensor is an Anti-Virus tool designed for AI Artifacts and a Firewall for RAG pipelines. It secures the AI Supply Chain by scanning models, datasets, RAG documents, and notebooks for threats that traditional SAST tools may miss. Veritensor shifts security left by intercepting and sanitizing malicious documents, poisoned datasets, and compromised dependencies before they enter the execution environment. It understands binary and serialized formats used in Machine Learning, such as models, data & RAG documents, notebooks, dependencies, and governance aspects. The tool offers features like native RAG security integration, high-performance parallel scanning, advanced stealth detection, dataset security, archive inspection, dependency audit, data provenance, identity verification, de-obfuscation engine, magic number validation, smart filtering, and entropy analysis.
kubectl-mcp-server
Control your entire Kubernetes infrastructure through natural language conversations with AI. Talk to your clusters like you talk to a DevOps expert. Debug crashed pods, optimize costs, deploy applications, audit security, manage Helm charts, and visualize dashboards—all through natural language. The tool provides 253 powerful tools, 8 workflow prompts, 8 data resources, and works with all major AI assistants. It offers AI-powered diagnostics, built-in cost optimization, enterprise-ready features, zero learning curve, universal compatibility, visual insights, and production-grade deployment options. From debugging crashed pods to optimizing cluster costs, kubectl-mcp-server is your AI-powered DevOps companion.
For similar tasks
document-ai-samples
The Google Cloud Document AI Samples repository contains code samples and Community Samples demonstrating how to analyze, classify, and search documents using Google Cloud Document AI. It includes various projects showcasing different functionalities such as integrating with Google Drive, processing documents using Python, content moderation with Dialogflow CX, fraud detection, language extraction, paper summarization, tax processing pipeline, and more. The repository also provides access to test document files stored in a publicly-accessible Google Cloud Storage Bucket. Additionally, there are codelabs available for optical character recognition (OCR), form parsing, specialized processors, and managing Document AI processors. Community samples, like the PDF Annotator Sample, are also included. Contributions are welcome, and users can seek help or report issues through the repository's issues page. Please note that this repository is not an officially supported Google product and is intended for demonstrative purposes only.
step-free-api
The StepChat Free service provides high-speed streaming output, multi-turn dialogue support, online search support, long document interpretation, and image parsing. It offers zero-configuration deployment, multi-token support, and automatic session trace cleaning. It is fully compatible with the ChatGPT interface. Additionally, it provides seven other free APIs for various services. The repository includes a disclaimer about using reverse APIs and encourages users to avoid commercial use to prevent service pressure on the official platform. It offers online testing links, showcases different demos, and provides deployment guides for Docker, Docker-compose, Render, Vercel, and native deployments. The repository also includes information on using multiple accounts, optimizing Nginx reverse proxy, and checking the liveliness of refresh tokens.
unilm
The 'unilm' repository is a collection of tools, models, and architectures for Foundation Models and General AI, focusing on tasks such as NLP, MT, Speech, Document AI, and Multimodal AI. It includes various pre-trained models, such as UniLM, InfoXLM, DeltaLM, MiniLM, AdaLM, BEiT, LayoutLM, WavLM, VALL-E, and more, designed for tasks like language understanding, generation, translation, vision, speech, and multimodal processing. The repository also features toolkits like s2s-ft for sequence-to-sequence fine-tuning and Aggressive Decoding for efficient sequence-to-sequence decoding. Additionally, it offers applications like TrOCR for OCR, LayoutReader for reading order detection, and XLM-T for multilingual NMT.
searchGPT
searchGPT is an open-source project that aims to build a search engine based on Large Language Model (LLM) technology to provide natural language answers. It supports web search with real-time results, file content search, and semantic search from sources like the Internet. The tool integrates LLM technologies such as OpenAI and GooseAI, and offers an easy-to-use frontend user interface. The project is designed to provide grounded answers by referencing real-time factual information, addressing the limitations of LLM's training data. Contributions, especially from frontend developers, are welcome under the MIT License.
LLMs-at-DoD
This repository contains tutorials for using Large Language Models (LLMs) in the U.S. Department of Defense. The tutorials utilize open-source frameworks and LLMs, allowing users to run them in their own cloud environments. The repository is maintained by the Defense Digital Service and welcomes contributions from users.
LARS
LARS is an application that enables users to run Large Language Models (LLMs) locally on their devices, upload their own documents, and engage in conversations where the LLM grounds its responses with the uploaded content. The application focuses on Retrieval Augmented Generation (RAG) to increase accuracy and reduce AI-generated inaccuracies. LARS provides advanced citations, supports various file formats, allows follow-up questions, provides full chat history, and offers customization options for LLM settings. Users can force enable or disable RAG, change system prompts, and tweak advanced LLM settings. The application also supports GPU-accelerated inferencing, multiple embedding models, and text extraction methods. LARS is open-source and aims to be the ultimate RAG-centric LLM application.
EAGLE
Eagle is a family of Vision-Centric High-Resolution Multimodal LLMs that enhance multimodal LLM perception using a mix of vision encoders and various input resolutions. The model features a channel-concatenation-based fusion for vision experts with different architectures and knowledge, supporting up to over 1K input resolution. It excels in resolution-sensitive tasks like optical character recognition and document understanding.
erag
ERAG is an advanced system that combines lexical, semantic, text, and knowledge graph searches with conversation context to provide accurate and contextually relevant responses. This tool processes various document types, creates embeddings, builds knowledge graphs, and uses this information to answer user queries intelligently. It includes modules for interacting with web content, GitHub repositories, and performing exploratory data analysis using various language models.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.