tokf
Config-driven CLI tool that compresses command output before it reaches an LLM context
Stars: 52
Tokf is a versatile text analysis tool designed to extract key information from text data. It provides functionalities for text summarization, sentiment analysis, keyword extraction, and named entity recognition. Tokf is easy to use and can handle large volumes of text data efficiently. Whether you are a data scientist, researcher, or developer, Tokf can help you gain valuable insights from your text data.
README:
tokf.net — reduce LLM context consumption from CLI commands by 60–90%.
Commands like git push, cargo test, and docker build produce verbose output packed with progress bars, compile noise, and boilerplate. tokf intercepts that output, applies a TOML filter, and emits only what matters — so your AI agent sees a clean signal instead of hundreds of wasted tokens.
cargo test — 61 lines → 1 line:
| Without tokf | With tokf |
|---|---|
|
|
git push — 8 lines → 1 line:
| Without tokf | With tokf |
|---|---|
|
|
brew install mpecan/tokf/tokfcargo install tokfgit clone https://github.com/mpecan/tokf
cd tokf
cargo build --release
# binary at target/release/tokftokf run git push origin main
tokf looks up a filter for git push, runs the command, and applies the filter. The filter logic lives in plain TOML files — no recompilation required. Anyone can author, share, or override a filter.
tokf run git push origin main
tokf run cargo test
tokf run docker build .tokf test filters/git/push.toml tests/fixtures/git_push_success.txt --exit-code 0tokf verify # run all test suites
tokf verify git/push # run a specific suite
tokf verify --list # list available suites and case counts
tokf verify --json # output results as JSON
tokf verify --require-all # fail if any filter has no test suite
tokf verify --list --require-all # show coverage per filter
tokf verify --scope project # only project-local filters (.tokf/filters/)
tokf verify --scope global # only user-level filters (~/.config/tokf/filters/)
tokf verify --scope stdlib # only built-in stdlib (filters/ in CWD)tokf ls # list all filters
tokf which "cargo test" # which filter would match
tokf show git/push # print the TOML sourcetokf eject cargo/build # copy to .tokf/filters/ (project-local)
tokf eject cargo/build --global # copy to ~/.config/tokf/filters/ (user-level)This copies the filter TOML and its test suite to your config directory, where it shadows the built-in. Edit the ejected copy freely — tokf's priority system ensures your version is used instead of the original.
| Flag | Description |
|---|---|
--timing |
Print how long filtering took |
--verbose |
Show which filter was matched (also explains skipped rewrites) |
--no-filter |
Pass output through without filtering |
--no-cache |
Bypass the filter discovery cache |
--no-mask-exit-code |
Disable exit-code masking. By default tokf exits 0 and prepends Error: Exit code N on failure |
--baseline-pipe |
Pipe command for fair baseline accounting (injected by rewrite) |
--prefer-less |
Compare filtered vs piped output and use whichever is smaller (requires --baseline-pipe) |
| Filter | Command |
|---|---|
git/add |
git add |
git/commit |
git commit |
git/diff |
git diff |
git/log |
git log |
git/push |
git push |
git/show |
git show |
git/status |
git status |
cargo/build |
cargo build |
cargo/check |
cargo check |
cargo/clippy |
cargo clippy |
cargo/install |
cargo install * |
cargo/test |
cargo test |
docker/* |
docker build, docker compose, docker images, docker ps
|
npm/run |
npm run * |
npm/test |
npm test, pnpm test, yarn test (with vitest/jest variants) |
pnpm/* |
pnpm add, pnpm install
|
go/* |
go build, go vet
|
gradle/* |
gradle build, gradle test, gradle dependencies
|
gh/* |
gh pr list, gh pr view, gh pr checks, gh issue list, gh issue view
|
kubectl/* |
kubectl get pods |
next/* |
next build |
prisma/* |
prisma generate |
pytest |
Python test runner |
tsc |
TypeScript compiler |
ls |
ls |
Filters are TOML files placed in .tokf/filters/ (project-local) or ~/.config/tokf/filters/ (user-level). Project-local filters take priority over user-level, which take priority over the built-in library.
command = "my-tool"
[on_success]
output = "ok ✓"
[on_failure]
tail = 10tokf matches commands against filter patterns using two built-in behaviours:
Basename matching — the first word of a pattern is compared by basename, so a filter with command = "git push" will also match /usr/bin/git push or ./git push. This works automatically; no special pattern syntax is required.
Transparent global flags — flag-like tokens between the command name and a subcommand keyword are skipped during matching. A filter for git log will match all of:
git log
git -C /path log
git --no-pager -C /path log --oneline
/usr/bin/git --no-pager -C /path log
The skipped flags are preserved in the command that actually runs — they are only bypassed during the pattern match.
Note on
runoverride and transparent flags: If a filter sets arunfield, transparent global flags are not included in{args}. Only the arguments that appear after the matched pattern words are available as{args}.
command = "git push" # command pattern to match (supports wildcards and arrays)
run = "git push {args}" # override command to actually execute
skip = ["^Enumerating", "^Counting"] # drop lines matching these regexes
keep = ["^error"] # keep only lines matching (inverse of skip)
# Per-line regex replacement — applied before skip/keep, in order.
# Capture groups use {1}, {2}, … . Invalid patterns are silently skipped.
[[replace]]
pattern = '^(\S+)\s+\S+\s+(\S+)\s+(\S+)'
output = "{1}: {2} → {3}"
dedup = true # collapse consecutive identical lines
dedup_window = 10 # optional: compare within a N-line sliding window
strip_ansi = true # strip ANSI escape sequences before processing
trim_lines = true # trim leading/trailing whitespace from each line
strip_empty_lines = true # remove all blank lines from the final output
collapse_empty_lines = true # collapse consecutive blank lines into one
show_history_hint = true # append a hint line pointing to the full output in history
match_output = [ # whole-output substring checks, short-circuit the pipeline
{ contains = "rejected", output = "push rejected" },
]
[on_success] # branch for exit code 0
output = "ok ✓ {2}" # template; {output} = pre-filtered output
[on_failure] # branch for non-zero exit
tail = 10 # keep the last N linesOutput templates support pipe chains: {var | pipe | pipe: "arg"}.
| Pipe | Input → Output | Description |
|---|---|---|
join: "sep" |
Collection → Str | Join items with separator |
each: "tmpl" |
Collection → Collection | Map each item through a sub-template |
truncate: N |
Str → Str | Truncate to N characters, appending …
|
lines |
Str → Collection | Split on newlines |
keep: "re" |
Collection → Collection | Retain items matching the regex |
where: "re" |
Collection → Collection | Alias for keep:
|
Example — filter a multi-line output variable to only error lines:
[on_failure]
output = "{output | lines | keep: \"^error\" | join: \"\\n\"}"Example — for each collected block, show only > (pointer) and E (assertion) lines:
[on_failure]
output = "{failure_lines | each: \"{value | lines | keep: \\\"^[>E] \\\"}\" | join: \"\\n\"}"Some commands are wrappers around different underlying tools (e.g. npm test may run Jest, Vitest, or Mocha). A parent filter can declare [[variant]] entries that delegate to specialized child filters based on project context:
command = ["npm test", "pnpm test", "yarn test"]
strip_ansi = true
skip = ["^> ", "^\\s*npm (warn|notice|WARN|verbose|info|timing|error|ERR)"]
[on_success]
output = "{output}"
[on_failure]
tail = 20
[[variant]]
name = "vitest"
detect.files = ["vitest.config.ts", "vitest.config.js", "vitest.config.mts"]
filter = "npm/test-vitest"
[[variant]]
name = "jest"
detect.files = ["jest.config.js", "jest.config.ts", "jest.config.json"]
filter = "npm/test-jest"Detection is two-phase:
- File detection (before execution) — checks if config files exist in the current directory. First match wins.
- Output pattern (after execution) — regex-matches command output. Used as a fallback when no file was detected.
When no variant matches, the parent filter's own fields (skip, on_success, etc.) apply as the fallback.
The filter field references another filter by its discovery name (relative path without .toml). Use tokf which "npm test" -v to see variant resolution.
TOML ordering:
[[variant]]entries must appear after all top-level fields (skip,[on_success], etc.) because TOML array-of-tables sections capture subsequent keys.
-
.tokf/filters/in the current directory (repo-local overrides) -
~/.config/tokf/filters/(user-level overrides) - Built-in library (embedded in the binary)
First match wins. Use tokf which "git push" to see which filter would activate.
Filter tests live in a <stem>_test/ directory adjacent to the filter TOML:
filters/
git/
push.toml <- filter config
push_test/ <- test suite
success.toml
rejected.toml
Each test case is a TOML file specifying a fixture (inline or file path), expected exit code, and one or more [[expect]] assertions:
name = "rejected push shows pull hint"
fixture = "tests/fixtures/git_push_rejected.txt"
exit_code = 1
[[expect]]
equals = "✗ push rejected (try pulling first)"For quick inline fixtures without a file:
name = "clean tree shows nothing to commit"
inline = "## main...origin/main\n"
exit_code = 0
[[expect]]
contains = "clean"Assertion types:
| Field | Description |
|---|---|
equals |
Output exactly equals this string |
contains |
Output contains this substring |
not_contains |
Output does not contain this substring |
starts_with |
Output starts with this string |
ends_with |
Output ends with this string |
line_count |
Output has exactly N non-empty lines |
matches |
Output matches this regex |
not_matches |
Output does not match this regex |
Exit codes from tokf verify: 0 = all pass, 1 = assertion failure, 2 = config/IO error or uncovered filters (--require-all).
For logic that TOML can't express — numeric math, multi-line lookahead, conditional branching — embed a Luau script:
command = "my-tool"
[lua_script]
lang = "luau"
source = '''
if exit_code == 0 then
return "passed"
else
return "FAILED: " .. output:match("Error: (.+)") or output
end
'''Available globals: output (string), exit_code (integer — the underlying command's real exit code, unaffected by --no-mask-exit-code), args (table).
Return a string to replace output, or nil to fall through to the rest of the TOML pipeline.
The sandbox blocks io, os, and package — no filesystem or network access from scripts.
tokf records input/output byte counts per run in a local SQLite database:
tokf gain # summary: total bytes saved and reduction %
tokf gain --daily # day-by-day breakdown
tokf gain --by-filter # breakdown by filter
tokf gain --json # machine-readable outputtokf records raw and filtered outputs in a local SQLite database, useful for debugging filters or reviewing what an AI agent saw:
tokf history list # recent entries (current project)
tokf history list -l 20 # show 20 entries
tokf history list --all # entries from all projects
tokf history show 42 # full details for entry #42
tokf history show --raw 42 # print only the raw captured output
tokf history search "error" # search by command or output content
tokf history clear # clear current project history
tokf history clear --all # clear all history (destructive)When an LLM receives filtered output it may not realise the full output exists. Two mechanisms can automatically append a hint line pointing to the history entry:
1. Filter opt-in — set show_history_hint = true in a filter TOML to always append the hint for that command:
command = "git status"
show_history_hint = true
[on_success]
output = "{branch} — {counts}"2. Automatic repetition detection — tokf detects when the same command is run twice in a row for the same project. This is a signal the caller didn't act on the previous filtered output and may need the full content:
✓ cargo test: 42 passed (2.31s)
Filtered - full output: `tokf history show --raw 99`
The hint is appended to stdout so it is visible to both humans and LLMs in the tool output. The history entry itself always stores the clean filtered output, without the hint line.
tokf integrates with Claude Code as a PreToolUse hook that automatically filters every Bash tool call — no changes to your workflow required.
tokf hook install # project-local (.tokf/)
tokf hook install --global # user-level (~/.config/tokf/)Once installed, every command Claude runs through the Bash tool is filtered transparently. Track cumulative savings with tokf gain.
tokf also ships a filter-authoring skill that teaches Claude the complete filter schema:
tokf skill install # project-local (.claude/skills/)
tokf skill install --global # user-level (~/.claude/skills/)tokf integrates with OpenCode via a plugin that applies filters in real-time before command execution.
Requirements: OpenCode with Bun runtime installed.
Install (project-local):
tokf hook install --tool opencodeInstall (global):
tokf hook install --tool opencode --globalThis writes .opencode/plugins/tokf.ts (or ~/.config/opencode/plugins/tokf.ts for --global), which OpenCode auto-loads. The plugin uses OpenCode's tool.execute.before hook to intercept bash tool calls and rewrites the command in-place when a matching filter exists. Restart OpenCode after installation for the plugin to take effect.
If tokf rewrite fails or no filter matches, the command passes through unmodified (fail-safe).
tokf integrates with OpenAI Codex CLI via a skill that instructs the agent to prefix supported commands with tokf run.
Install (project-local):
tokf hook install --tool codexInstall (global):
tokf hook install --tool codex --globalThis writes .agents/skills/tokf-run/SKILL.md (or ~/.agents/skills/tokf-run/SKILL.md for --global), which Codex auto-discovers. Unlike the Claude Code hook (which intercepts commands at the tool level), the Codex integration is skill-based: it teaches the agent to use tokf run as a command prefix. If tokf is not installed, the agent falls back to running commands without the prefix (fail-safe).
tokf ships a Claude Code skill that teaches Claude the complete filter schema, processing order, step types, template pipes, and naming conventions.
Invoke automatically: Claude will activate the skill whenever you ask to create or modify a filter — just describe what you want in natural language:
"Create a filter for
npm installoutput that keeps only warnings and errors" "Write a tokf filter forpytestthat shows a summary on success and failure details on fail"
Invoke explicitly with the /tokf-filter slash command:
/tokf-filter create a filter for docker build output
The skill is in .claude/skills/tokf-filter/SKILL.md. Reference material (exhaustive step docs and an annotated example TOML) lives in .claude/skills/tokf-filter/references/.
tokf looks for a rewrites.toml file in two locations (first found wins):
-
Project-local:
.tokf/rewrites.toml— scoped to the current repository -
User-level:
~/.config/tokf/rewrites.toml— applies to all projects
This file controls custom rewrite rules, skip patterns, and pipe handling. All [pipe], [skip], and [[rewrite]] sections documented below go in this file.
When a command is piped to a simple output-shaping tool (grep, tail, or head), tokf strips the pipe automatically and uses its own structured filter output instead. The original pipe suffix is passed to --baseline-pipe so token savings are still calculated accurately.
# These ARE rewritten — pipe is stripped, tokf applies its filter:
cargo test | grep FAILED
cargo test | tail -20
git diff HEAD | head -5Multi-pipe chains, pipes to other commands, or pipe targets with unsupported flags are left unchanged:
# These are NOT rewritten — tokf leaves them alone:
kubectl get pods | grep Running | wc -l # multi-pipe chain
cargo test | wc -l # wc not supported
cargo test | tail -f # -f (follow) not supportedIf you want tokf to wrap a piped command that wouldn't normally be rewritten, add an explicit rule to .tokf/rewrites.toml:
[[rewrite]]
match = "^cargo test \\| tee"
replace = "tokf run {0}"Use tokf rewrite --verbose "cargo test | grep FAILED" to see how a command is being rewritten.
If you prefer tokf to never strip pipes (leaving piped commands unchanged), add a [pipe] section to .tokf/rewrites.toml:
[pipe]
strip = false # default: trueWhen strip = false, commands like cargo test | tail -5 pass through the shell unchanged. Non-piped commands are still rewritten normally.
Sometimes the piped output (e.g. tail -5) is actually smaller than the filtered output. The prefer_less option tells tokf to compare both at runtime and use whichever is smaller:
[pipe]
prefer_less = true # default: falseWhen a pipe is stripped, tokf injects --prefer-less alongside --baseline-pipe. At runtime:
- The filter runs normally
- The original pipe command also runs on the raw output
- tokf prints whichever result is smaller
When the pipe output wins, the event is recorded with pipe_override = 1 in the tracking DB. The tokf gain command shows how many times this happened:
tokf gain summary
total runs: 42
input tokens: 12,500 est.
output tokens: 3,200 est.
tokens saved: 9,300 est. (74.4%)
pipe preferred: 5 runs (pipe output was smaller than filter)
Note: strip = false takes priority — if pipe stripping is disabled, prefer_less has no effect.
Leading KEY=VALUE assignments are automatically stripped before matching, so env-prefixed commands are rewritten correctly:
# These ARE rewritten — env vars are preserved, the command is wrapped:
DEBUG=1 git status → DEBUG=1 tokf run git status
RUST_LOG=debug cargo test → RUST_LOG=debug tokf run cargo test
A=1 B=2 cargo test | tail -5 → A=1 B=2 tokf run --baseline-pipe 'tail -5' cargo testThe env vars are passed through verbatim to the underlying command; tokf only rewrites the executable portion.
User-defined skip patterns in .tokf/rewrites.toml match against the full shell segment, including any leading env vars. A pattern ^cargo will not skip RUST_LOG=debug cargo test because the segment doesn't start with cargo:
[skip]
patterns = ["^cargo"] # skips "cargo test" but NOT "RUST_LOG=debug cargo test"To skip a command regardless of any env prefix, use a pattern that accounts for it:
[skip]
patterns = ["(?:^|\\s)cargo\\s"] # matches "cargo" anywhere after start or whitespacetokf info prints a summary of all paths, database locations, and filter counts. Useful for debugging when filters aren't being found or to verify your setup:
tokf info # human-readable output
tokf info --json # machine-readable JSONExample output:
tokf 0.2.8
filter search directories:
[local] /home/user/project/.tokf/filters (not found)
[user] /home/user/.config/tokf/filters (not found)
[built-in] <embedded> (always available)
tracking database:
TOKF_DB_PATH: (not set)
path: /home/user/.local/share/tokf/tracking.db (exists)
filter cache:
path: /home/user/.cache/tokf/manifest.bin (exists)
filters:
local: 0
user: 0
built-in: 38
total: 38
Override the tracking database path with the TOKF_DB_PATH environment variable:
TOKF_DB_PATH=/tmp/my-tracking.db tokf infotokf caches the filter discovery index for faster startup. The cache rebuilds automatically when filters change, but you can manage it manually:
tokf cache info # show cache location, size, and validity
tokf cache clear # delete the cache, forcing a rebuild on next runtokf-server uses the GitHub device flow so CLI clients can authenticate without handling secrets.
Starts the device authorization flow. Returns a user_code and verification_uri for the user to visit in their browser. Rate-limited to 10 requests per IP per hour.
Response (201 Created):
{
"device_code": "dc-abc123",
"user_code": "ABCD-1234",
"verification_uri": "https://github.com/login/device",
"expires_in": 900,
"interval": 5
}Polls for a completed device authorization. The CLI calls this on an interval until the user has authorized.
Request body:
{ "device_code": "dc-abc123" }Response (200 OK) when authorized:
{
"access_token": "...",
"token_type": "bearer",
"expires_in": 7776000,
"user": { "id": 1, "username": "octocat", "avatar_url": "..." }
}Response (200 OK) while waiting:
{ "error": "authorization_pending" }Re-polling a completed device code is idempotent — a fresh token is issued.
| Variable | Required | Description |
|---|---|---|
GITHUB_CLIENT_ID |
yes | OAuth App client ID |
GITHUB_CLIENT_SECRET |
yes | OAuth App client secret |
TRUST_PROXY |
no | Set true to trust X-Forwarded-For for IP extraction (default false) |
tokf was heavily inspired by rtk (rtk-ai.app) — a CLI proxy that compresses command output before it reaches an AI agent's context window. rtk pioneered the idea and demonstrated that 60–90% context reduction is achievable across common dev tools. tokf takes a different approach (TOML-driven filters, user-overridable library, Claude Code hook integration) but the core insight is theirs.
MIT — see LICENSE.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for tokf
Similar Open Source Tools
tokf
Tokf is a versatile text analysis tool designed to extract key information from text data. It provides functionalities for text summarization, sentiment analysis, keyword extraction, and named entity recognition. Tokf is easy to use and can handle large volumes of text data efficiently. Whether you are a data scientist, researcher, or developer, Tokf can help you gain valuable insights from your text data.
turftopic
Turftopic is a Python library that provides tools for sentiment analysis and topic modeling of text data. It allows users to analyze large volumes of text data to extract insights on sentiment and topics. The library includes functions for preprocessing text data, performing sentiment analysis using machine learning models, and conducting topic modeling using algorithms such as Latent Dirichlet Allocation (LDA). Turftopic is designed to be user-friendly and efficient, making it suitable for both beginners and experienced data analysts.
NadirClaw
NadirClaw is a powerful open-source tool designed for web scraping and data extraction. It provides a user-friendly interface for extracting data from websites with ease. With NadirClaw, users can easily scrape text, images, and other content from web pages for various purposes such as data analysis, research, and automation. The tool offers flexibility and customization options to cater to different scraping needs, making it a versatile solution for extracting data from the web. Whether you are a data scientist, researcher, or developer, NadirClaw can streamline your data extraction process and help you gather valuable insights from online sources.
cellm
Cellm is an Excel extension that allows users to leverage Large Language Models (LLMs) like ChatGPT within cell formulas. It enables users to extract AI responses to text ranges, making it useful for automating repetitive tasks that involve data processing and analysis. Cellm supports various models from Anthropic, Mistral, OpenAI, and Google, as well as locally hosted models via Llamafiles, Ollama, or vLLM. The tool is designed to simplify the integration of AI capabilities into Excel for tasks such as text classification, data cleaning, content summarization, entity extraction, and more.
llm-d
LLM-D is a machine learning model for sentiment analysis. It is designed to classify text data into positive, negative, or neutral sentiment categories. The model is trained on a large dataset of labeled text samples and uses natural language processing techniques to analyze and predict sentiment in new text inputs. LLM-D is a powerful tool for businesses and researchers looking to understand customer feedback, social media sentiment, and other text data sources. It can be easily integrated into existing applications or used as a standalone tool for sentiment analysis tasks.
Aimer_WT
Aimer_WT is a web scraping tool designed to extract data from websites efficiently and accurately. It provides a user-friendly interface for users to specify the data they want to scrape and offers various customization options. With Aimer_WT, users can easily automate the process of collecting data from multiple web pages, saving time and effort. The tool is suitable for both beginners and experienced users who need to gather data for research, analysis, or other purposes. Aimer_WT supports various data formats and allows users to export the extracted data for further processing.
Daft
Daft is a lightweight and efficient tool for data analysis and visualization. It provides a user-friendly interface for exploring and manipulating datasets, making it ideal for both beginners and experienced data analysts. With Daft, you can easily import data from various sources, clean and preprocess it, perform statistical analysis, create insightful visualizations, and export your results in multiple formats. Whether you are a student, researcher, or business professional, Daft simplifies the process of analyzing data and deriving meaningful insights.
ciana-parrot
Ciana Parrot is a lightweight and user-friendly tool for analyzing and visualizing data. It provides a simple interface for users to upload their datasets and generate insightful visualizations to gain valuable insights. With Ciana Parrot, users can easily explore their data, identify patterns, trends, and outliers, and communicate their findings effectively. The tool supports various data formats and offers a range of visualization options to suit different analysis needs. Whether you are a data analyst, researcher, or student, Ciana Parrot can help you streamline your data analysis process and make data-driven decisions with confidence.
RAG-To-Know
RAG-To-Know is a versatile tool for knowledge extraction and summarization. It leverages the RAG (Retrieval-Augmented Generation) framework to provide a seamless way to retrieve and summarize information from various sources. With RAG-To-Know, users can easily extract key insights and generate concise summaries from large volumes of text data. The tool is designed to streamline the process of information retrieval and summarization, making it ideal for researchers, students, journalists, and anyone looking to quickly grasp the essence of complex information.
HyperAgent
HyperAgent is a powerful tool for automating repetitive tasks in web scraping and data extraction. It provides a user-friendly interface to create custom web scraping scripts without the need for extensive coding knowledge. With HyperAgent, users can easily extract data from websites, transform it into structured formats, and save it for further analysis. The tool supports various data formats and offers scheduling options for automated data extraction at regular intervals. HyperAgent is suitable for individuals and businesses looking to streamline their data collection processes and improve efficiency in extracting information from the web.
CrossIntelligence
CrossIntelligence is a powerful tool for data analysis and visualization. It allows users to easily connect and analyze data from multiple sources, providing valuable insights and trends. With a user-friendly interface and customizable features, CrossIntelligence is suitable for both beginners and advanced users in various industries such as marketing, finance, and research.
ROGRAG
ROGRAG is a powerful open-source tool designed for data analysis and visualization. It provides a user-friendly interface for exploring and manipulating datasets, making it ideal for researchers, data scientists, and analysts. With ROGRAG, users can easily import, clean, analyze, and visualize data to gain valuable insights and make informed decisions. The tool supports a wide range of data formats and offers a variety of statistical and visualization tools to help users uncover patterns, trends, and relationships in their data. Whether you are working on exploratory data analysis, statistical modeling, or data visualization, ROGRAG is a versatile tool that can streamline your workflow and enhance your data analysis capabilities.
XRAG
XRAG is a powerful open-source tool for analyzing and visualizing data. It provides a user-friendly interface for data exploration, manipulation, and interpretation. With XRAG, users can easily import, clean, and transform data to uncover insights and trends. The tool supports various data formats and offers a wide range of statistical and machine learning algorithms for advanced analysis. XRAG is suitable for data scientists, analysts, researchers, and students looking to gain valuable insights from their data.
datatune
Datatune is a data analysis tool designed to help users explore and analyze datasets efficiently. It provides a user-friendly interface for importing, cleaning, visualizing, and modeling data. With Datatune, users can easily perform tasks such as data preprocessing, feature engineering, model selection, and evaluation. The tool offers a variety of statistical and machine learning algorithms to support data analysis tasks. Whether you are a data scientist, analyst, or researcher, Datatune can streamline your data analysis workflow and help you derive valuable insights from your data.
atlas
Atlas is a powerful data visualization tool that allows users to create interactive charts and graphs from their datasets. It provides a user-friendly interface for exploring and analyzing data, making it ideal for both beginners and experienced data analysts. With Atlas, users can easily customize the appearance of their visualizations, add filters and drill-down capabilities, and share their insights with others. The tool supports a wide range of data formats and offers various chart types to suit different data visualization needs. Whether you are looking to create simple bar charts or complex interactive dashboards, Atlas has you covered.
mcp-use
MCP-Use is a Python library for analyzing and processing text data using Markov Chains. It provides functionalities for generating text based on input data, calculating transition probabilities, and simulating text sequences. The library is designed to be user-friendly and efficient, making it suitable for natural language processing tasks.
For similar tasks
AivisSpeech-Engine
AivisSpeech-Engine is a powerful open-source tool for speech recognition and synthesis. It provides state-of-the-art algorithms for converting speech to text and text to speech. The tool is designed to be user-friendly and customizable, allowing developers to easily integrate speech capabilities into their applications. With AivisSpeech-Engine, users can transcribe audio recordings, create voice-controlled interfaces, and generate natural-sounding speech output. Whether you are building a virtual assistant, developing a speech-to-text application, or experimenting with voice technology, AivisSpeech-Engine offers a comprehensive solution for all your speech processing needs.
npcsh
`npcsh` is a python-based command-line tool designed to integrate Large Language Models (LLMs) and Agents into one's daily workflow by making them available and easily configurable through the command line shell. It leverages the power of LLMs to understand natural language commands and questions, execute tasks, answer queries, and provide relevant information from local files and the web. Users can also build their own tools and call them like macros from the shell. `npcsh` allows users to take advantage of agents (i.e. NPCs) through a managed system, tailoring NPCs to specific tasks and workflows. The tool is extensible with Python, providing useful functions for interacting with LLMs, including explicit coverage for popular providers like ollama, anthropic, openai, gemini, deepseek, and openai-like providers. Users can set up a flask server to expose their NPC team for use as a backend service, run SQL models defined in their project, execute assembly lines, and verify the integrity of their NPC team's interrelations. Users can execute bash commands directly, use favorite command-line tools like VIM, Emacs, ipython, sqlite3, git, pipe the output of these commands to LLMs, or pass LLM results to bash commands.
tokf
Tokf is a versatile text analysis tool designed to extract key information from text data. It provides functionalities for text summarization, sentiment analysis, keyword extraction, and named entity recognition. Tokf is easy to use and can handle large volumes of text data efficiently. Whether you are a data scientist, researcher, or developer, Tokf can help you gain valuable insights from your text data.
ComfyUI_VLM_nodes
ComfyUI_VLM_nodes is a repository containing various nodes for utilizing Vision Language Models (VLMs) and Language Models (LLMs). The repository provides nodes for tasks such as structured output generation, image to music conversion, LLM prompt generation, automatic prompt generation, and more. Users can integrate different models like InternLM-XComposer2-VL, UForm-Gen2, Kosmos-2, moondream1, moondream2, JoyTag, and Chat Musician. The nodes support features like extracting keywords, generating prompts, suggesting prompts, and obtaining structured outputs. The repository includes examples and instructions for using the nodes effectively.
lector
Lector is a text analysis tool that helps users extract insights from unstructured text data. It provides functionalities such as sentiment analysis, keyword extraction, entity recognition, and text summarization. With Lector, users can easily analyze large volumes of text data to uncover patterns, trends, and valuable information. The tool is designed to be user-friendly and efficient, making it suitable for both beginners and experienced users in the field of natural language processing and text mining.
ALwrity
ALwrity is a lightweight and user-friendly text analysis tool designed for developers and data scientists. It provides various functionalities for analyzing and processing text data, including sentiment analysis, keyword extraction, and text summarization. With ALwrity, users can easily gain insights from their text data and make informed decisions based on the analysis results. The tool is highly customizable and can be integrated into existing workflows seamlessly, making it a valuable asset for anyone working with text data in their projects.
llm-memorization
The 'llm-memorization' project is a tool designed to index, archive, and search conversations with a local LLM using a SQLite database enriched with automatically extracted keywords. It aims to provide personalized context at the start of a conversation by adding memory information to the initial prompt. The tool automates queries from local LLM conversational management libraries, offers a hybrid search function, enhances prompts based on posed questions, and provides an all-in-one graphical user interface for data visualization. It supports both French and English conversations and prompts for bilingual use.
document-ai-samples
The Google Cloud Document AI Samples repository contains code samples and Community Samples demonstrating how to analyze, classify, and search documents using Google Cloud Document AI. It includes various projects showcasing different functionalities such as integrating with Google Drive, processing documents using Python, content moderation with Dialogflow CX, fraud detection, language extraction, paper summarization, tax processing pipeline, and more. The repository also provides access to test document files stored in a publicly-accessible Google Cloud Storage Bucket. Additionally, there are codelabs available for optical character recognition (OCR), form parsing, specialized processors, and managing Document AI processors. Community samples, like the PDF Annotator Sample, are also included. Contributions are welcome, and users can seek help or report issues through the repository's issues page. Please note that this repository is not an officially supported Google product and is intended for demonstrative purposes only.
For similar jobs
databerry
Chaindesk is a no-code platform that allows users to easily set up a semantic search system for personal data without technical knowledge. It supports loading data from various sources such as raw text, web pages, files (Word, Excel, PowerPoint, PDF, Markdown, Plain Text), and upcoming support for web sites, Notion, and Airtable. The platform offers a user-friendly interface for managing datastores, querying data via a secure API endpoint, and auto-generating ChatGPT Plugins for each datastore. Chaindesk utilizes a Vector Database (Qdrant), Openai's text-embedding-ada-002 for embeddings, and has a chunk size of 1024 tokens. The technology stack includes Next.js, Joy UI, LangchainJS, PostgreSQL, Prisma, and Qdrant, inspired by the ChatGPT Retrieval Plugin.
OAD
OAD is a powerful open-source tool for analyzing and visualizing data. It provides a user-friendly interface for exploring datasets, generating insights, and creating interactive visualizations. With OAD, users can easily import data from various sources, clean and preprocess data, perform statistical analysis, and create customizable visualizations to communicate findings effectively. Whether you are a data scientist, analyst, or researcher, OAD can help you streamline your data analysis workflow and uncover valuable insights from your data.
sqlcoder
Defog's SQLCoder is a family of state-of-the-art large language models (LLMs) designed for converting natural language questions into SQL queries. It outperforms popular open-source models like gpt-4 and gpt-4-turbo on SQL generation tasks. SQLCoder has been trained on more than 20,000 human-curated questions based on 10 different schemas, and the model weights are licensed under CC BY-SA 4.0. Users can interact with SQLCoder through the 'transformers' library and run queries using the 'sqlcoder launch' command in the terminal. The tool has been tested on NVIDIA GPUs with more than 16GB VRAM and Apple Silicon devices with some limitations. SQLCoder offers a demo on their website and supports quantized versions of the model for consumer GPUs with sufficient memory.
TableLLM
TableLLM is a large language model designed for efficient tabular data manipulation tasks in real office scenarios. It can generate code solutions or direct text answers for tasks like insert, delete, update, query, merge, and chart operations on tables embedded in spreadsheets or documents. The model has been fine-tuned based on CodeLlama-7B and 13B, offering two scales: TableLLM-7B and TableLLM-13B. Evaluation results show its performance on benchmarks like WikiSQL, Spider, and self-created table operation benchmark. Users can use TableLLM for code and text generation tasks on tabular data.
mlcraft
Synmetrix (prev. MLCraft) is an open source data engineering platform and semantic layer for centralized metrics management. It provides a complete framework for modeling, integrating, transforming, aggregating, and distributing metrics data at scale. Key features include data modeling and transformations, semantic layer for unified data model, scheduled reports and alerts, versioning, role-based access control, data exploration, caching, and collaboration on metrics modeling. Synmetrix leverages Cube (Cube.js) for flexible data models that consolidate metrics from various sources, enabling downstream distribution via a SQL API for integration into BI tools, reporting, dashboards, and data science. Use cases include data democratization, business intelligence, embedded analytics, and enhancing accuracy in data handling and queries. The tool speeds up data-driven workflows from metrics definition to consumption by combining data engineering best practices with self-service analytics capabilities.
data-scientist-roadmap2024
The Data Scientist Roadmap2024 provides a comprehensive guide to mastering essential tools for data science success. It includes programming languages, machine learning libraries, cloud platforms, and concepts categorized by difficulty. The roadmap covers a wide range of topics from programming languages to machine learning techniques, data visualization tools, and DevOps/MLOps tools. It also includes web development frameworks and specific concepts like supervised and unsupervised learning, NLP, deep learning, reinforcement learning, and statistics. Additionally, it delves into DevOps tools like Airflow and MLFlow, data visualization tools like Tableau and Matplotlib, and other topics such as ETL processes, optimization algorithms, and financial modeling.
VMind
VMind is an open-source solution for intelligent visualization, providing an intelligent chart component based on LLM by VisActor. It allows users to create chart narrative works with natural language interaction, edit charts through dialogue, and export narratives as videos or GIFs. The tool is easy to use, scalable, supports various chart types, and offers one-click export functionality. Users can customize chart styles, specify themes, and aggregate data using LLM models. VMind aims to enhance efficiency in creating data visualization works through dialogue-based editing and natural language interaction.
quadratic
Quadratic is a modern multiplayer spreadsheet application that integrates Python, AI, and SQL functionalities. It aims to streamline team collaboration and data analysis by enabling users to pull data from various sources and utilize popular data science tools. The application supports building dashboards, creating internal tools, mixing data from different sources, exploring data for insights, visualizing Python workflows, and facilitating collaboration between technical and non-technical team members. Quadratic is built with Rust + WASM + WebGL to ensure seamless performance in the browser, and it offers features like WebGL Grid, local file management, Python and Pandas support, Excel formula support, multiplayer capabilities, charts and graphs, and team support. The tool is currently in Beta with ongoing development for additional features like JS support, SQL database support, and AI auto-complete.