NadirClaw
Open-source LLM router that saves you money. Routes simple prompts to cheap/local models, complex ones to premium — automatically. OpenAI-compatible proxy.
Stars: 99
NadirClaw is a powerful open-source tool designed for web scraping and data extraction. It provides a user-friendly interface for extracting data from websites with ease. With NadirClaw, users can easily scrape text, images, and other content from web pages for various purposes such as data analysis, research, and automation. The tool offers flexibility and customization options to cater to different scraping needs, making it a versatile solution for extracting data from the web. Whether you are a data scientist, researcher, or developer, NadirClaw can streamline your data extraction process and help you gather valuable insights from online sources.
README:
Open-source LLM router that saves you money. Simple prompts go to cheap/local models, complex prompts go to premium models -- automatically.
NadirClaw sits between your AI tool and your LLM providers as an OpenAI-compatible proxy. It classifies every prompt in ~10ms and routes it to the right model. Works with any tool that speaks the OpenAI API: OpenClaw, Codex, Claude Code, Continue, Cursor, or plain curl.
How does NadirClaw compare to OpenRouter? See NadirClaw vs OpenRouter.
pip install nadirclawOr install from source:
curl -fsSL https://raw.githubusercontent.com/doramirdor/NadirClaw/main/install.sh | shThen run the interactive setup wizard:
nadirclaw setupThis guides you through selecting providers, entering API keys, and choosing models for each routing tier. Then start the router:
nadirclaw serve --verboseThat's it. NadirClaw starts on http://localhost:8856 with sensible defaults (Gemini 3 Flash for simple, OpenAI Codex for complex). If you skip nadirclaw setup, the serve command will offer to run it on first launch.
- Smart routing — classifies prompts in ~10ms using sentence embeddings
- Agentic task detection — auto-detects tool use, multi-step loops, and agent system prompts; forces complex model for agentic requests
- Reasoning detection — identifies prompts needing chain-of-thought and routes to reasoning-optimized models
-
Routing profiles —
auto,eco,premium,free,reasoning— choose your cost/quality strategy per request -
Model aliases — use short names like
sonnet,flash,gpt4instead of full model IDs - Session persistence — pins the model for multi-turn conversations so you don't bounce between models mid-thread
- Context-window filtering — auto-swaps to a model with a larger context window when your conversation is too long
- Rate limit fallback — if the primary model is rate-limited (429), automatically falls back to the other tier's model instead of failing
- Streaming support — full SSE streaming compatible with OpenClaw, Codex, and other streaming clients
- Native Gemini support — calls Gemini models directly via the Google GenAI SDK (not through LiteLLM)
-
OAuth login — use your subscription with
nadirclaw auth <provider> login(OpenAI, Anthropic, Google), no API key needed - Multi-provider — supports Gemini, OpenAI, Anthropic, Ollama, and any LiteLLM-supported provider
- OpenAI-compatible API — drop-in replacement for any tool that speaks the OpenAI chat completions API
-
Request reporting —
nadirclaw reportanalyzes your JSONL logs with filters, latency stats, tier breakdown, and token usage -
Raw logging — optional
--log-rawflag to capture full request/response content for debugging and replay -
OpenTelemetry tracing — optional distributed tracing with GenAI semantic conventions (
pip install nadirclaw[telemetry])
- Python 3.10+
- git
-
At least one LLM provider:
- Google Gemini API key (free tier: 20 req/day)
- Ollama running locally (free, no API key needed)
- Anthropic API key for Claude models
- OpenAI API key for GPT models
- Provider subscriptions via OAuth (
nadirclaw auth openai login,nadirclaw auth anthropic login,nadirclaw auth antigravity login,nadirclaw auth gemini login) - Or any provider supported by LiteLLM
curl -fsSL https://raw.githubusercontent.com/doramirdor/NadirClaw/main/install.sh | shThis clones the repo to ~/.nadirclaw, creates a virtual environment, installs dependencies, and adds nadirclaw to your PATH. Run it again to update.
git clone https://github.com/doramirdor/NadirClaw.git
cd NadirClaw
python3 -m venv venv
source venv/bin/activate
pip install -e .rm -rf ~/.nadirclaw
sudo rm -f /usr/local/bin/nadirclawNadirClaw loads configuration from ~/.nadirclaw/.env. Create or edit this file to set API keys and model preferences:
# ~/.nadirclaw/.env
# API keys (set the ones you use)
GEMINI_API_KEY=AIza...
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# Model routing
NADIRCLAW_SIMPLE_MODEL=gemini-3-flash-preview
NADIRCLAW_COMPLEX_MODEL=gemini-2.5-pro
# Server
NADIRCLAW_PORT=8856If ~/.nadirclaw/.env does not exist, NadirClaw falls back to .env in the current directory.
NadirClaw supports multiple ways to provide LLM credentials, checked in this order:
-
OpenClaw stored token (
~/.openclaw/agents/main/agent/auth-profiles.json) -
NadirClaw stored credential (
~/.nadirclaw/credentials.json) -
Environment variable (
GEMINI_API_KEY,ANTHROPIC_API_KEY,OPENAI_API_KEY, etc.)
# Add a Gemini API key
nadirclaw auth add --provider google --key AIza...
# Add any provider API key
nadirclaw auth add --provider anthropic --key sk-ant-...
nadirclaw auth add --provider openai --key sk-...
# Login with your OpenAI/ChatGPT subscription (OAuth, no API key needed)
nadirclaw auth openai login
# Login with your Anthropic/Claude subscription (OAuth, no API key needed)
nadirclaw auth anthropic login
# Login with Google Gemini (OAuth, opens browser)
nadirclaw auth gemini login
# Login with Google Antigravity (OAuth, opens browser)
nadirclaw auth antigravity login
# Store a Claude subscription token (from 'claude setup-token') - alternative to OAuth
nadirclaw auth setup-token
# Check what's configured
nadirclaw auth status
# Remove a credential
nadirclaw auth remove googleSet API keys in ~/.nadirclaw/.env:
GEMINI_API_KEY=AIza... # or GOOGLE_API_KEY
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...Configure which model handles each tier:
NADIRCLAW_SIMPLE_MODEL=gemini-3-flash-preview # cheap/free model
NADIRCLAW_COMPLEX_MODEL=gemini-2.5-pro # premium model
NADIRCLAW_REASONING_MODEL=o3 # reasoning tasks (optional, defaults to complex)
NADIRCLAW_FREE_MODEL=ollama/llama3.1:8b # free fallback (optional, defaults to simple)| Setup | Simple Model | Complex Model | API Keys Needed |
|---|---|---|---|
| Gemini + Gemini | gemini-2.5-flash |
gemini-2.5-pro |
GEMINI_API_KEY |
| Gemini + Claude | gemini-2.5-flash |
claude-sonnet-4-5-20250929 |
GEMINI_API_KEY + ANTHROPIC_API_KEY
|
| Claude + Ollama | ollama/llama3.1:8b |
claude-sonnet-4-5-20250929 |
ANTHROPIC_API_KEY |
| Claude + Claude | claude-haiku-4-5-20251001 |
claude-sonnet-4-5-20250929 |
ANTHROPIC_API_KEY |
| OpenAI + Ollama | ollama/llama3.1:8b |
gpt-4.1 |
OPENAI_API_KEY |
| OpenAI + OpenAI | gpt-4.1-mini |
gpt-4.1 |
OPENAI_API_KEY |
| OpenAI Codex | gemini-2.5-flash |
openai-codex/gpt-5.3-codex |
GEMINI_API_KEY + OAuth login |
| Fully local | ollama/llama3.1:8b |
ollama/qwen3:32b |
None |
Gemini models are called natively via the Google GenAI SDK. All other models go through LiteLLM, which supports 100+ providers.
Gemini is the default simple model. NadirClaw calls Gemini natively via the Google GenAI SDK for best performance.
# Set your Gemini API key
nadirclaw auth add --provider google --key AIza...
# Or set in ~/.nadirclaw/.env
echo "GEMINI_API_KEY=AIza..." >> ~/.nadirclaw/.env
# Start the router
nadirclaw serve --verboseIf the primary model hits a 429 rate limit, NadirClaw automatically retries once, then falls back to the other tier's model. For example, if gemini-3-flash-preview is exhausted, NadirClaw will try gemini-2.5-pro (or whatever your complex model is). If both models are rate-limited, it returns a friendly error message instead of crashing.
If you're running Ollama locally, NadirClaw works out of the box with no API keys:
# Fully local setup -- no API keys, no cost
NADIRCLAW_SIMPLE_MODEL=ollama/llama3.1:8b \
NADIRCLAW_COMPLEX_MODEL=ollama/qwen3:32b \
nadirclaw serve --verboseOr mix local + cloud:
nadirclaw serve \
--simple-model ollama/llama3.1:8b \
--complex-model claude-sonnet-4-20250514 \
--verbose| Model | Size | Good For |
|---|---|---|
llama3.1:8b |
4.7 GB | Simple tier (fast, good enough) |
qwen3:32b |
19 GB | Complex tier (local, no API cost) |
qwen3-coder |
19 GB | Code-heavy complex tier |
deepseek-r1:14b |
9 GB | Reasoning-heavy complex tier |
OpenClaw is a personal AI assistant that bridges messaging services to AI coding agents. NadirClaw integrates as a model provider so OpenClaw's requests are automatically routed to the right model.
# Auto-configure OpenClaw to use NadirClaw
nadirclaw openclaw onboard
# Start the router
nadirclaw serveThis writes NadirClaw as a provider in ~/.openclaw/openclaw.json with model nadirclaw/auto. If OpenClaw is already running, it will auto-reload the config -- no restart needed.
nadirclaw openclaw onboard
# Then start NadirClaw separately when ready:
nadirclaw servenadirclaw openclaw onboard adds this to your OpenClaw config:
{
"models": {
"providers": {
"nadirclaw": {
"baseUrl": "http://localhost:8856/v1",
"apiKey": "local",
"api": "openai-completions",
"models": [{ "id": "auto", "name": "auto" }]
}
}
},
"agents": {
"defaults": {
"model": { "primary": "nadirclaw/auto" }
}
}
}NadirClaw supports the SSE streaming format that OpenClaw expects (stream: true), handling multi-modal content and tool definitions in system prompts.
Codex is OpenAI's CLI coding agent. NadirClaw integrates as a custom model provider.
# Auto-configure Codex
nadirclaw codex onboard
# Start the router
nadirclaw serveThis writes ~/.codex/config.toml:
model_provider = "nadirclaw"
[model_providers.nadirclaw]
base_url = "http://localhost:8856/v1"
api_key = "local"To use your ChatGPT subscription instead of an API key:
# Login with your OpenAI account (opens browser)
nadirclaw auth openai login
# NadirClaw will auto-refresh the token when it expiresThis delegates to the Codex CLI for the OAuth flow and stores the credentials in ~/.nadirclaw/credentials.json. Tokens are automatically refreshed when they expire.
NadirClaw exposes a standard OpenAI-compatible API. Point any tool at it:
# Base URL
http://localhost:8856/v1
# Model
model: "auto" # or omit -- NadirClaw picks the best modelcurl http://localhost:8856/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "What is 2+2?"}]
}'curl http://localhost:8856/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "What is 2+2?"}],
"stream": true
}'from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8856/v1",
api_key="local", # NadirClaw doesn't require auth by default
)
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "What is 2+2?"}],
)
print(response.choices[0].message.content)Choose your routing strategy by setting the model field:
| Profile | Model Field | Strategy | Use Case |
|---|---|---|---|
| auto |
auto or omit |
Smart routing (default) | Best overall balance |
| eco | eco |
Always use simple model | Maximum savings |
| premium | premium |
Always use complex model | Best quality |
| free | free |
Use free fallback model | Zero cost |
| reasoning | reasoning |
Use reasoning model | Chain-of-thought tasks |
# Use profiles via the model field
curl http://localhost:8856/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "eco", "messages": [{"role": "user", "content": "Hello"}]}'
# Also works with nadirclaw/ prefix
# model: "nadirclaw/eco", "nadirclaw/premium", etc.Use short names instead of full model IDs:
| Alias | Resolves To |
|---|---|
sonnet |
claude-sonnet-4-5-20250929 |
opus |
claude-opus-4-6-20250918 |
haiku |
claude-haiku-4-5-20251001 |
gpt4 |
gpt-4.1 |
gpt5 |
gpt-5.2 |
flash |
gemini-2.5-flash |
gemini-pro |
gemini-2.5-pro |
deepseek |
deepseek/deepseek-chat |
deepseek-r1 |
deepseek/deepseek-reasoner |
llama |
ollama/llama3.1:8b |
# Use an alias as the model
curl http://localhost:8856/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "sonnet", "messages": [{"role": "user", "content": "Hello"}]}'Beyond basic simple/complex classification, NadirClaw applies routing modifiers that can override the base decision:
NadirClaw detects agentic requests (coding agents, multi-step tool use) and forces them to the complex model, even if the individual message looks simple. Signals:
- Tool definitions in the request (
toolsarray) - Tool-role messages (active tool execution loop)
- Assistant→tool→assistant cycles (multi-step execution)
- Agent-like system prompts ("you are a coding agent", "you can execute commands")
- Long system prompts (>500 chars, typical of agent instructions)
- Deep conversations (>10 messages)
This prevents a message like "now add tests" from being routed to the cheap model when it's part of an ongoing agentic refactoring session.
Prompts with 2+ reasoning markers are routed to the reasoning model (or complex model if no reasoning model is configured):
- "step by step", "think through", "chain of thought"
- "prove that", "derive the", "mathematically show"
- "analyze the tradeoffs", "compare and contrast"
- "critically analyze", "evaluate whether"
Once a conversation is routed to a model, subsequent messages in the same session reuse that model. This prevents jarring mid-conversation model switches. Sessions are keyed by system prompt + first user message, with a 30-minute TTL.
If the estimated token count of a request exceeds a model's context window, NadirClaw automatically swaps to a model with a larger context. For example, a 150k-token conversation targeting gpt-4o (128k context) will be redirected to gemini-2.5-pro (1M context).
nadirclaw setup # Interactive setup wizard (providers, keys, models)
nadirclaw serve # Start the router server
nadirclaw serve --log-raw # Start with full request/response logging
nadirclaw classify # Classify a prompt (no server needed)
nadirclaw report # Show a summary report of request logs
nadirclaw report --since 24h # Report for the last 24 hours
nadirclaw status # Show config, credentials, and server status
nadirclaw auth add # Add an API key for any provider
nadirclaw auth status # Show configured credentials (masked)
nadirclaw auth remove # Remove a stored credential
nadirclaw auth setup-token # Store a Claude subscription token (alternative to OAuth)
nadirclaw auth openai login # Login with OpenAI subscription (OAuth)
nadirclaw auth openai logout # Remove stored OpenAI OAuth credential
nadirclaw auth anthropic login # Login with Anthropic/Claude subscription (OAuth)
nadirclaw auth anthropic logout # Remove stored Anthropic OAuth credential
nadirclaw auth antigravity login # Login with Google Antigravity (OAuth, opens browser)
nadirclaw auth antigravity logout # Remove stored Antigravity OAuth credential
nadirclaw auth gemini login # Login with Google Gemini (OAuth, opens browser)
nadirclaw auth gemini logout # Remove stored Gemini OAuth credential
nadirclaw codex onboard # Configure Codex integration
nadirclaw openclaw onboard # Configure OpenClaw integration
nadirclaw build-centroids # Regenerate centroid vectors from prototypesnadirclaw serve [OPTIONS]
Options:
--port INTEGER Port to listen on (default: 8856)
--simple-model TEXT Model for simple prompts
--complex-model TEXT Model for complex prompts
--models TEXT Comma-separated model list (legacy)
--token TEXT Auth token
--verbose Enable debug logging
--log-raw Log full raw requests and responses to JSONLAnalyze request logs and print a summary report:
nadirclaw report # full report
nadirclaw report --since 24h # last 24 hours
nadirclaw report --since 7d # last 7 days
nadirclaw report --since 2025-02-01 # since a specific date
nadirclaw report --model gemini # filter by model name
nadirclaw report --format json # machine-readable JSON output
nadirclaw report --export report.txt # save to fileExample output:
NadirClaw Report
==================================================
Total requests: 147
From: 2026-02-14T08:12:03+00:00
To: 2026-02-14T22:47:19+00:00
Requests by Type
------------------------------
classify 12
completion 135
Tier Distribution
------------------------------
complex 41 (31.1%)
direct 8 (6.1%)
simple 83 (62.9%)
Model Usage
------------------------------------------------------------
Model Reqs Tokens
gemini-3-flash-preview 83 48210
openai-codex/gpt-5.3-codex 41 127840
claude-sonnet-4-20250514 8 31500
Latency (ms)
----------------------------------------
classifier avg=12 p50=11 p95=24
total avg=847 p50=620 p95=2340
Token Usage
------------------------------
Prompt: 138420
Completion: 69130
Total: 207550
Fallbacks: 3
Errors: 2
Streaming requests: 47
Requests with tools: 18 (54 tools total)
Classify a prompt locally without running the server. Useful for testing your setup:
$ nadirclaw classify "What is 2+2?"
Tier: simple
Confidence: 0.2848
Score: 0.0000
Model: gemini-3-flash-preview
$ nadirclaw classify "Design a distributed system for real-time trading"
Tier: complex
Confidence: 0.1843
Score: 1.0000
Model: gemini-2.5-pro$ nadirclaw status
NadirClaw Status
----------------------------------------
Simple model: gemini-3-flash-preview
Complex model: gemini-2.5-pro
Tier config: explicit (env vars)
Port: 8856
Threshold: 0.06
Log dir: /Users/you/.nadirclaw/logs
Token: nadir-***
Server: RUNNING (ok)Most LLM usage doesn't need a premium model. NadirClaw routes each prompt to the right tier automatically:
NadirClaw uses a binary complexity classifier based on sentence embeddings:
-
Pre-computed centroids: Ships two tiny centroid vectors (~1.5 KB each) derived from ~170 seed prompts. These are pre-computed and included in the package — no training step required.
-
Classification: For each incoming prompt, computes its embedding using all-MiniLM-L6-v2 (~80 MB, downloaded once on first use) and measures cosine similarity to both centroids. If the prompt is closer to the complex centroid, it routes to your complex model; otherwise to your simple model.
-
Borderline handling: When confidence is below the threshold (default 0.06), the classifier defaults to complex -- it's cheaper to over-serve a simple prompt than to under-serve a complex one.
-
Routing modifiers: After classification, NadirClaw applies intelligent overrides:
- Agentic detection — if tool definitions, tool-role messages, or agent system prompts are detected, forces the complex model
- Reasoning detection — if 2+ reasoning markers are found, routes to the reasoning model
- Context window check — if the conversation exceeds the model's context window, swaps to a model that fits
- Session persistence — reuses the same model for follow-up messages in the same conversation
-
Dispatch: Calls the selected model via the appropriate backend:
- Gemini models — called natively via the Google GenAI SDK for best performance
- All other models — called via LiteLLM, which provides a unified interface to 100+ providers
-
Rate limit fallback: If the selected model returns a 429 rate limit error, NadirClaw retries once, then automatically falls back to the other tier's model. If both are rate-limited, it returns a user-friendly error message.
Classification takes ~10ms on a warm encoder. The first request takes ~2-3 seconds to load the embedding model.
Auth is disabled by default (local-only). Set NADIRCLAW_AUTH_TOKEN to require a bearer token.
| Endpoint | Method | Description |
|---|---|---|
/v1/chat/completions |
POST | OpenAI-compatible completions with auto routing (supports stream: true) |
/v1/classify |
POST | Classify a prompt without calling an LLM |
/v1/classify/batch |
POST | Classify multiple prompts at once |
/v1/models |
GET | List available models |
/v1/logs |
GET | View recent request logs |
/health |
GET | Health check (no auth required) |
| Variable | Default | Description |
|---|---|---|
NADIRCLAW_SIMPLE_MODEL |
gemini-3-flash-preview |
Model for simple prompts |
NADIRCLAW_COMPLEX_MODEL |
openai-codex/gpt-5.3-codex |
Model for complex prompts |
NADIRCLAW_REASONING_MODEL |
(falls back to complex) | Model for reasoning tasks |
NADIRCLAW_FREE_MODEL |
(falls back to simple) | Free fallback model |
NADIRCLAW_AUTH_TOKEN |
(empty — auth disabled) | Set to require a bearer token |
GEMINI_API_KEY |
-- | Google Gemini API key (also accepts GOOGLE_API_KEY) |
ANTHROPIC_API_KEY |
-- | Anthropic API key |
OPENAI_API_KEY |
-- | OpenAI API key |
OLLAMA_API_BASE |
http://localhost:11434 |
Ollama base URL |
NADIRCLAW_CONFIDENCE_THRESHOLD |
0.06 |
Classification threshold (lower = more complex) |
NADIRCLAW_PORT |
8856 |
Server port |
NADIRCLAW_LOG_DIR |
~/.nadirclaw/logs |
Log directory |
NADIRCLAW_LOG_RAW |
false |
Log full raw requests and responses (true/false) |
NADIRCLAW_MODELS |
openai-codex/gpt-5.3-codex,gemini-3-flash-preview |
Legacy model list (fallback if tier vars not set) |
OTEL_EXPORTER_OTLP_ENDPOINT |
(empty — disabled) | OpenTelemetry collector endpoint (enables tracing) |
NadirClaw supports optional distributed tracing via OpenTelemetry. Install the extras and set an OTLP endpoint:
pip install nadirclaw[telemetry]
# Export to a local collector (e.g. Jaeger, Grafana Tempo)
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 nadirclaw serveWhen enabled, NadirClaw emits spans for:
-
smart_route_analysis— classifier decision with tier and selected model -
dispatch_model— individual LLM provider call -
chat_completion— full request lifecycle
Spans include GenAI semantic conventions (gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens) plus custom nadirclaw.* attributes for routing metadata.
If the telemetry packages are not installed or OTEL_EXPORTER_OTLP_ENDPOINT is not set, all tracing is a no-op with zero overhead.
nadirclaw/
__init__.py # Package version
cli.py # CLI commands (setup, serve, classify, report, status, auth, codex, openclaw)
setup.py # Interactive setup wizard (provider selection, credentials, model config)
server.py # FastAPI server with OpenAI-compatible API + streaming
classifier.py # Binary complexity classifier (sentence embeddings)
credentials.py # Credential storage, resolution chain, and OAuth token refresh
encoder.py # Shared SentenceTransformer singleton
oauth.py # OAuth login flows (OpenAI, Anthropic, Gemini, Antigravity)
routing.py # Routing intelligence (agentic, reasoning, profiles, aliases, sessions)
report.py # Log parsing and report generation
telemetry.py # Optional OpenTelemetry integration (no-op without packages)
auth.py # Bearer token / API key authentication
settings.py # Environment-based configuration (reads ~/.nadirclaw/.env)
prototypes.py # Seed prompts for centroid generation
simple_centroid.npy # Pre-computed simple centroid vector
complex_centroid.npy # Pre-computed complex centroid vector
MIT
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for NadirClaw
Similar Open Source Tools
NadirClaw
NadirClaw is a powerful open-source tool designed for web scraping and data extraction. It provides a user-friendly interface for extracting data from websites with ease. With NadirClaw, users can easily scrape text, images, and other content from web pages for various purposes such as data analysis, research, and automation. The tool offers flexibility and customization options to cater to different scraping needs, making it a versatile solution for extracting data from the web. Whether you are a data scientist, researcher, or developer, NadirClaw can streamline your data extraction process and help you gather valuable insights from online sources.
Aimer_WT
Aimer_WT is a web scraping tool designed to extract data from websites efficiently and accurately. It provides a user-friendly interface for users to specify the data they want to scrape and offers various customization options. With Aimer_WT, users can easily automate the process of collecting data from multiple web pages, saving time and effort. The tool is suitable for both beginners and experienced users who need to gather data for research, analysis, or other purposes. Aimer_WT supports various data formats and allows users to export the extracted data for further processing.
waidrin
Waidrin is a powerful web scraping tool that allows users to easily extract data from websites. It provides a user-friendly interface for creating custom web scraping scripts and supports various data formats for exporting the extracted data. With Waidrin, users can automate the process of collecting information from multiple websites, saving time and effort. The tool is designed to be flexible and scalable, making it suitable for both beginners and advanced users in the field of web scraping.
onlook
Onlook is a web scraping tool that allows users to extract data from websites easily and efficiently. It provides a user-friendly interface for creating web scraping scripts and supports various data formats for exporting the extracted data. With Onlook, users can automate the process of collecting information from multiple websites, saving time and effort. The tool is designed to be flexible and customizable, making it suitable for a wide range of web scraping tasks.
HyperAgent
HyperAgent is a powerful tool for automating repetitive tasks in web scraping and data extraction. It provides a user-friendly interface to create custom web scraping scripts without the need for extensive coding knowledge. With HyperAgent, users can easily extract data from websites, transform it into structured formats, and save it for further analysis. The tool supports various data formats and offers scheduling options for automated data extraction at regular intervals. HyperAgent is suitable for individuals and businesses looking to streamline their data collection processes and improve efficiency in extracting information from the web.
Website-Crawler
Website-Crawler is a tool designed to extract data from websites in an automated manner. It allows users to scrape information such as text, images, links, and more from web pages. The tool provides functionalities to navigate through websites, handle different types of content, and store extracted data for further analysis. Website-Crawler is useful for tasks like web scraping, data collection, content aggregation, and competitive analysis. It can be customized to extract specific data elements based on user requirements, making it a versatile tool for various web data extraction needs.
firecrawl
Firecrawl is an API service that empowers AI applications with clean data from any website. It features advanced scraping, crawling, and data extraction capabilities. The repository is still in development, integrating custom modules into the mono repo. Users can run it locally but it's not fully ready for self-hosted deployment yet. Firecrawl offers powerful capabilities like scraping, crawling, mapping, searching, and extracting structured data from single pages, multiple pages, or entire websites with AI. It supports various formats, actions, and batch scraping. The tool is designed to handle proxies, anti-bot mechanisms, dynamic content, media parsing, change tracking, and more. Firecrawl is available as an open-source project under the AGPL-3.0 license, with additional features offered in the cloud version.
context7
Context7 is a powerful tool for analyzing and visualizing data in various formats. It provides a user-friendly interface for exploring datasets, generating insights, and creating interactive visualizations. With advanced features such as data filtering, aggregation, and customization, Context7 is suitable for both beginners and experienced data analysts. The tool supports a wide range of data sources and formats, making it versatile for different use cases. Whether you are working on exploratory data analysis, data visualization, or data storytelling, Context7 can help you uncover valuable insights and communicate your findings effectively.
ROGRAG
ROGRAG is a powerful open-source tool designed for data analysis and visualization. It provides a user-friendly interface for exploring and manipulating datasets, making it ideal for researchers, data scientists, and analysts. With ROGRAG, users can easily import, clean, analyze, and visualize data to gain valuable insights and make informed decisions. The tool supports a wide range of data formats and offers a variety of statistical and visualization tools to help users uncover patterns, trends, and relationships in their data. Whether you are working on exploratory data analysis, statistical modeling, or data visualization, ROGRAG is a versatile tool that can streamline your workflow and enhance your data analysis capabilities.
arconia
Arconia is a powerful open-source tool for managing and visualizing data in a user-friendly way. It provides a seamless experience for data analysts and scientists to explore, clean, and analyze datasets efficiently. With its intuitive interface and robust features, Arconia simplifies the process of data manipulation and visualization, making it an essential tool for anyone working with data.
CrossIntelligence
CrossIntelligence is a powerful tool for data analysis and visualization. It allows users to easily connect and analyze data from multiple sources, providing valuable insights and trends. With a user-friendly interface and customizable features, CrossIntelligence is suitable for both beginners and advanced users in various industries such as marketing, finance, and research.
atlas
Atlas is a powerful data visualization tool that allows users to create interactive charts and graphs from their datasets. It provides a user-friendly interface for exploring and analyzing data, making it ideal for both beginners and experienced data analysts. With Atlas, users can easily customize the appearance of their visualizations, add filters and drill-down capabilities, and share their insights with others. The tool supports a wide range of data formats and offers various chart types to suit different data visualization needs. Whether you are looking to create simple bar charts or complex interactive dashboards, Atlas has you covered.
vizra-adk
Vizra-ADK is a data visualization tool that allows users to create interactive and customizable visualizations for their data. With a user-friendly interface and a wide range of customization options, Vizra-ADK makes it easy for users to explore and analyze their data in a visually appealing way. Whether you're a data scientist looking to create informative charts and graphs, or a business analyst wanting to present your findings in a compelling way, Vizra-ADK has you covered. The tool supports various data formats and provides features like filtering, sorting, and grouping to help users make sense of their data quickly and efficiently.
crawl4ai
Crawl4AI is a powerful and free web crawling service that extracts valuable data from websites and provides LLM-friendly output formats. It supports crawling multiple URLs simultaneously, replaces media tags with ALT, and is completely free to use and open-source. Users can integrate Crawl4AI into Python projects as a library or run it as a standalone local server. The tool allows users to crawl and extract data from specified URLs using different providers and models, with options to include raw HTML content, force fresh crawls, and extract meaningful text blocks. Configuration settings can be adjusted in the `crawler/config.py` file to customize providers, API keys, chunk processing, and word thresholds. Contributions to Crawl4AI are welcome from the open-source community to enhance its value for AI enthusiasts and developers.
ag2
Ag2 is a lightweight and efficient tool for generating automated reports from data sources. It simplifies the process of creating reports by allowing users to define templates and automate the data extraction and formatting. With Ag2, users can easily generate reports in various formats such as PDF, Excel, and CSV, saving time and effort in manual report generation tasks.
llama_index
LlamaIndex is a data framework for building LLM applications. It provides tools for ingesting, structuring, and querying data, as well as integrating with LLMs and other tools. LlamaIndex is designed to be easy to use for both beginner and advanced users, and it provides a comprehensive set of features for building LLM applications.
For similar tasks
Awesome-Segment-Anything
Awesome-Segment-Anything is a powerful tool for segmenting and extracting information from various types of data. It provides a user-friendly interface to easily define segmentation rules and apply them to text, images, and other data formats. The tool supports both supervised and unsupervised segmentation methods, allowing users to customize the segmentation process based on their specific needs. With its versatile functionality and intuitive design, Awesome-Segment-Anything is ideal for data analysts, researchers, content creators, and anyone looking to efficiently extract valuable insights from complex datasets.
Time-LLM
Time-LLM is a reprogramming framework that repurposes large language models (LLMs) for time series forecasting. It allows users to treat time series analysis as a 'language task' and effectively leverage pre-trained LLMs for forecasting. The framework involves reprogramming time series data into text representations and providing declarative prompts to guide the LLM reasoning process. Time-LLM supports various backbone models such as Llama-7B, GPT-2, and BERT, offering flexibility in model selection. The tool provides a general framework for repurposing language models for time series forecasting tasks.
crewAI
CrewAI is a cutting-edge framework designed to orchestrate role-playing autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks. It enables AI agents to assume roles, share goals, and operate in a cohesive unit, much like a well-oiled crew. Whether you're building a smart assistant platform, an automated customer service ensemble, or a multi-agent research team, CrewAI provides the backbone for sophisticated multi-agent interactions. With features like role-based agent design, autonomous inter-agent delegation, flexible task management, and support for various LLMs, CrewAI offers a dynamic and adaptable solution for both development and production workflows.
Transformers_And_LLM_Are_What_You_Dont_Need
Transformers_And_LLM_Are_What_You_Dont_Need is a repository that explores the limitations of transformers in time series forecasting. It contains a collection of papers, articles, and theses discussing the effectiveness of transformers and LLMs in this domain. The repository aims to provide insights into why transformers may not be the best choice for time series forecasting tasks.
pytorch-forecasting
PyTorch Forecasting is a PyTorch-based package for time series forecasting with state-of-the-art network architectures. It offers a high-level API for training networks on pandas data frames and utilizes PyTorch Lightning for scalable training on GPUs and CPUs. The package aims to simplify time series forecasting with neural networks by providing a flexible API for professionals and default settings for beginners. It includes a timeseries dataset class, base model class, multiple neural network architectures, multi-horizon timeseries metrics, and hyperparameter tuning with optuna. PyTorch Forecasting is built on pytorch-lightning for easy training on various hardware configurations.
spider
Spider is a high-performance web crawler and indexer designed to handle data curation workloads efficiently. It offers features such as concurrency, streaming, decentralization, headless Chrome rendering, HTTP proxies, cron jobs, subscriptions, smart mode, blacklisting, whitelisting, budgeting depth, dynamic AI prompt scripting, CSS scraping, and more. Users can easily get started with the Spider Cloud hosted service or set up local installations with spider-cli. The tool supports integration with Node.js and Python for additional flexibility. With a focus on speed and scalability, Spider is ideal for extracting and organizing data from the web.
AI_for_Science_paper_collection
AI for Science paper collection is an initiative by AI for Science Community to collect and categorize papers in AI for Science areas by subjects, years, venues, and keywords. The repository contains `.csv` files with paper lists labeled by keys such as `Title`, `Conference`, `Type`, `Application`, `MLTech`, `OpenReviewLink`. It covers top conferences like ICML, NeurIPS, and ICLR. Volunteers can contribute by updating existing `.csv` files or adding new ones for uncovered conferences/years. The initiative aims to track the increasing trend of AI for Science papers and analyze trends in different applications.
pytorch-forecasting
PyTorch Forecasting is a PyTorch-based package designed for state-of-the-art timeseries forecasting using deep learning architectures. It offers a high-level API and leverages PyTorch Lightning for efficient training on GPU or CPU with automatic logging. The package aims to simplify timeseries forecasting tasks by providing a flexible API for professionals and user-friendly defaults for beginners. It includes features such as a timeseries dataset class for handling data transformations, missing values, and subsampling, various neural network architectures optimized for real-world deployment, multi-horizon timeseries metrics, and hyperparameter tuning with optuna. Built on pytorch-lightning, it supports training on CPUs, single GPUs, and multiple GPUs out-of-the-box.
For similar jobs
databerry
Chaindesk is a no-code platform that allows users to easily set up a semantic search system for personal data without technical knowledge. It supports loading data from various sources such as raw text, web pages, files (Word, Excel, PowerPoint, PDF, Markdown, Plain Text), and upcoming support for web sites, Notion, and Airtable. The platform offers a user-friendly interface for managing datastores, querying data via a secure API endpoint, and auto-generating ChatGPT Plugins for each datastore. Chaindesk utilizes a Vector Database (Qdrant), Openai's text-embedding-ada-002 for embeddings, and has a chunk size of 1024 tokens. The technology stack includes Next.js, Joy UI, LangchainJS, PostgreSQL, Prisma, and Qdrant, inspired by the ChatGPT Retrieval Plugin.
OAD
OAD is a powerful open-source tool for analyzing and visualizing data. It provides a user-friendly interface for exploring datasets, generating insights, and creating interactive visualizations. With OAD, users can easily import data from various sources, clean and preprocess data, perform statistical analysis, and create customizable visualizations to communicate findings effectively. Whether you are a data scientist, analyst, or researcher, OAD can help you streamline your data analysis workflow and uncover valuable insights from your data.
sqlcoder
Defog's SQLCoder is a family of state-of-the-art large language models (LLMs) designed for converting natural language questions into SQL queries. It outperforms popular open-source models like gpt-4 and gpt-4-turbo on SQL generation tasks. SQLCoder has been trained on more than 20,000 human-curated questions based on 10 different schemas, and the model weights are licensed under CC BY-SA 4.0. Users can interact with SQLCoder through the 'transformers' library and run queries using the 'sqlcoder launch' command in the terminal. The tool has been tested on NVIDIA GPUs with more than 16GB VRAM and Apple Silicon devices with some limitations. SQLCoder offers a demo on their website and supports quantized versions of the model for consumer GPUs with sufficient memory.
TableLLM
TableLLM is a large language model designed for efficient tabular data manipulation tasks in real office scenarios. It can generate code solutions or direct text answers for tasks like insert, delete, update, query, merge, and chart operations on tables embedded in spreadsheets or documents. The model has been fine-tuned based on CodeLlama-7B and 13B, offering two scales: TableLLM-7B and TableLLM-13B. Evaluation results show its performance on benchmarks like WikiSQL, Spider, and self-created table operation benchmark. Users can use TableLLM for code and text generation tasks on tabular data.
mlcraft
Synmetrix (prev. MLCraft) is an open source data engineering platform and semantic layer for centralized metrics management. It provides a complete framework for modeling, integrating, transforming, aggregating, and distributing metrics data at scale. Key features include data modeling and transformations, semantic layer for unified data model, scheduled reports and alerts, versioning, role-based access control, data exploration, caching, and collaboration on metrics modeling. Synmetrix leverages Cube (Cube.js) for flexible data models that consolidate metrics from various sources, enabling downstream distribution via a SQL API for integration into BI tools, reporting, dashboards, and data science. Use cases include data democratization, business intelligence, embedded analytics, and enhancing accuracy in data handling and queries. The tool speeds up data-driven workflows from metrics definition to consumption by combining data engineering best practices with self-service analytics capabilities.
data-scientist-roadmap2024
The Data Scientist Roadmap2024 provides a comprehensive guide to mastering essential tools for data science success. It includes programming languages, machine learning libraries, cloud platforms, and concepts categorized by difficulty. The roadmap covers a wide range of topics from programming languages to machine learning techniques, data visualization tools, and DevOps/MLOps tools. It also includes web development frameworks and specific concepts like supervised and unsupervised learning, NLP, deep learning, reinforcement learning, and statistics. Additionally, it delves into DevOps tools like Airflow and MLFlow, data visualization tools like Tableau and Matplotlib, and other topics such as ETL processes, optimization algorithms, and financial modeling.
VMind
VMind is an open-source solution for intelligent visualization, providing an intelligent chart component based on LLM by VisActor. It allows users to create chart narrative works with natural language interaction, edit charts through dialogue, and export narratives as videos or GIFs. The tool is easy to use, scalable, supports various chart types, and offers one-click export functionality. Users can customize chart styles, specify themes, and aggregate data using LLM models. VMind aims to enhance efficiency in creating data visualization works through dialogue-based editing and natural language interaction.
quadratic
Quadratic is a modern multiplayer spreadsheet application that integrates Python, AI, and SQL functionalities. It aims to streamline team collaboration and data analysis by enabling users to pull data from various sources and utilize popular data science tools. The application supports building dashboards, creating internal tools, mixing data from different sources, exploring data for insights, visualizing Python workflows, and facilitating collaboration between technical and non-technical team members. Quadratic is built with Rust + WASM + WebGL to ensure seamless performance in the browser, and it offers features like WebGL Grid, local file management, Python and Pandas support, Excel formula support, multiplayer capabilities, charts and graphs, and team support. The tool is currently in Beta with ongoing development for additional features like JS support, SQL database support, and AI auto-complete.





