
local-deep-research
Local Deep Research is an AI-powered assistant that transforms complex questions into comprehensive, cited reports by conducting iterative analysis using any LLM across diverse knowledge sources including academic databases, scientific repositories, web content, and private document collections.
Stars: 2046

Local Deep Research is a powerful AI-powered research assistant that performs deep, iterative analysis using multiple LLMs and web searches. It can be run locally for privacy or configured to use cloud-based LLMs for enhanced capabilities. The tool offers advanced research capabilities, flexible LLM support, rich output options, privacy-focused operation, enhanced search integration, and academic & scientific integration. It also provides a web interface, command line interface, and supports multiple LLM providers and search engines. Users can configure AI models, search engines, and research parameters for customized research experiences.
README:
-
🔍 Advanced Research Capabilities
- Automated deep research with intelligent follow-up questions
- Proper inline citation and source verification
- Multi-iteration analysis for comprehensive coverage
- Full webpage content analysis (not just snippets)
-
🤖 Flexible LLM Support
- Local AI processing with Ollama models
- Cloud LLM support (Claude, GPT)
- Supports all Langchain models
- Configurable model selection based on needs
-
📊 Rich Output Options
- Detailed research findings with proper citations
- Well-structured comprehensive research reports
- Quick summaries for rapid insights
- Source tracking and verification
-
🔒 Privacy-Focused
- Runs entirely on your machine when using local models
- Configurable search settings
- Transparent data handling
-
🌐 Enhanced Search Integration
- Auto-selection of search sources: The "auto" search engine intelligently analyzes your query and selects the most appropriate search engine
- Multiple search engines including Wikipedia, arXiv, PubMed, Semantic Scholar, and more
- Local RAG search for private documents - search your own documents with vector embeddings
- Full webpage content retrieval and intelligent filtering
-
🎓 Academic & Scientific Integration
- Direct integration with PubMed, arXiv, Wikipedia, Semantic Scholar
- Properly formatted citations from academic sources
- Report structure suitable for literature reviews
- Cross-disciplinary synthesis of information
A powerful AI-powered research assistant that performs deep, iterative analysis using multiple LLMs and web searches. The system can be run locally for privacy or configured to use cloud-based LLMs for enhanced capabilities.
Important for non-academic searches: For normal web searches you will need SearXGN or an API key to a search provider like brave search or SerpAPI. The free searches are mostly academic search engines and will not help you for most normal searches.
Download the Windows Installer for easy one-click installation.
Requires Ollama (or other model provider configured in .env). Download from https://ollama.ai and then pull a model ollama pull gemma3:12b
# Install the package
pip install local-deep-research
# Install required browser automation tools
playwright install
# For local models, install Ollama
# Download from https://ollama.ai and then pull a model
ollama pull gemma3:12b
Then run:
# Start the web interface (recommended)
ldr-web # (OR python -m local_deep_research.web.app)
# OR run the command line version
ldr # (OR python -m local_deep_research.main)
Access the web interface at http://127.0.0.1:5000
in your browser.
Build the image first if you haven't already
docker build -t local-deep-research .
Quick Docker Run
# Run with default settings (connects to Ollama running on the host)
docker run --network=host \
-e LDR_LLM__PROVIDER="ollama" \
-e LDR_LLM__MODEL="mistral" \
local-deep-research
For comprehensive Docker setup information, see:
Local Deep Research now provides a simple API for programmatic access to its research capabilities:
import os
# Set environment variables to control the LLM
os.environ["LDR_LLM__MODEL"] = "mistral" # Specify model name
from local_deep_research import quick_summary, generate_report, analyze_documents
# Generate a quick research summary with custom parameters
results = quick_summary(
query="advances in fusion energy",
search_tool="auto", # Auto-select the best search engine
iterations=1, # Single research cycle for speed
questions_per_iteration=2, # Generate 2 follow-up questions
max_results=30, # Consider up to 30 search results
temperature=0.7 # Control creativity of generation
)
print(results["summary"])
These functions provide flexible options for customizing the search parameters, iterations, and output formats. For more examples, see the programmatic access tutorial.
The package automatically creates and manages configuration files in your user directory:
-
Windows:
Documents\LearningCircuit\local-deep-research\config\
-
Linux/Mac:
~/.config/local_deep_research/config/
When you first run the tool, it creates these configuration files:
File | Purpose |
---|---|
settings.toml |
General settings for research, web interface, and search |
llm_config.py |
Advanced LLM configuration (rarely needs modification) |
search_engines.toml |
Define and configure search engines |
local_collections.toml |
Configure local document collections for RAG |
.env |
Environment variables for configuration (recommended for API keys) |
Note: For comprehensive environment variable configuration, see our Environment Variables Guide.
The system supports multiple LLM providers:
- Install Ollama
- Pull a model:
ollama pull gemma3:12b
(recommended model) - Ollama runs on port 11434 by default
Add API keys to your environment variables (recommended) by creating a .env
file in your config directory:
# Set API keys for cloud providers in .env
ANTHROPIC_API_KEY=your-api-key-here # For Claude models
OPENAI_API_KEY=your-openai-key-here # For GPT models
OPENAI_ENDPOINT_API_KEY=your-key-here # For OpenRouter or similar services
# Set your preferred LLM provider and model (no need to edit llm_config.py)
LDR_LLM__PROVIDER=ollama # Options: ollama, openai, anthropic, etc.
LDR_LLM__MODEL=gemma3:12b # Model name to use
Important: In most cases, you don't need to modify the
llm_config.py
file. Simply set theLDR_LLM__PROVIDER
andLDR_LLM__MODEL
environment variables to use your preferred model.
The system supports multiple LLM providers:
Provider | Type | API Key | Setup Details | Models |
---|---|---|---|---|
OLLAMA |
Local | No | Install from ollama.ai | Mistral, Llama, Gemma, etc. |
OPENAI |
Cloud | OPENAI_API_KEY |
Set in environment | GPT-3.5, GPT-4, GPT-4o |
ANTHROPIC |
Cloud | ANTHROPIC_API_KEY |
Set in environment | Claude 3 Opus, Sonnet, Haiku |
OPENAI_ENDPOINT |
Cloud | OPENAI_ENDPOINT_API_KEY |
Set in environment | Any OpenAI-compatible model |
VLLM |
Local | No | Requires GPU setup | Any supported by vLLM |
LMSTUDIO |
Local | No | Use LM Studio server | Models from LM Studio |
LLAMACPP |
Local | No | Configure model path | GGUF model formats |
The OPENAI_ENDPOINT
provider can access any service with an OpenAI-compatible API, including:
- OpenRouter (access to hundreds of models)
- Azure OpenAI
- Together.ai
- Groq
- Anyscale
- Self-hosted LLM servers with OpenAI compatibility
Some search engines require API keys. Add them to your environment variables by creating a .env
file in your config directory:
# Search engine API keys (add to .env file)
SERP_API_KEY=your-serpapi-key-here # For Google results via SerpAPI
GOOGLE_PSE_API_KEY=your-google-key-here # For Google Programmable Search
GOOGLE_PSE_ENGINE_ID=your-pse-id-here # For Google Programmable Search
BRAVE_API_KEY=your-brave-search-key-here # For Brave Search
GUARDIAN_API_KEY=your-guardian-key-here # For The Guardian
# Set your preferred search tool
LDR_SEARCH__TOOL=auto # Default: intelligently selects best engine
Tip: To override other settings via environment variables (e.g., to change the web port), use: LDR_WEB__PORT=8080
Engine | Purpose | API Key Required? | Rate Limit |
---|---|---|---|
auto |
Intelligently selects the best engine | No | Based on selected engine |
wikipedia |
General knowledge and facts | No | No strict limit |
arxiv |
Scientific papers and research | No | No strict limit |
pubmed |
Medical and biomedical research | No | No strict limit |
semantic_scholar |
Academic literature across all fields | No | 100/5min |
github |
Code repositories and documentation | No | 60/hour (unauthenticated) |
brave |
Web search (privacy-focused) | Yes | Based on plan |
serpapi |
Google search results | Yes | Based on plan |
google_pse |
Custom Google search | Yes | 100/day free tier |
wayback |
Historical web content | No | No strict limit |
searxng |
Local web search engine | No (requires local server) | No limit |
Any collection name | Search your local documents | No | No limit |
Note: For detailed SearXNG setup, see our SearXNG Setup Guide.
The system can search through your local documents using vector embeddings.
- Define collections in
local_collections.toml
. Default collections include:
[project_docs]
name = "Project Documents"
description = "Project documentation and specifications"
paths = ["@format ${DOCS_DIR}/project_documents"]
enabled = true
embedding_model = "all-MiniLM-L6-v2"
embedding_device = "cpu"
embedding_model_type = "sentence_transformers"
max_results = 20
max_filtered_results = 5
chunk_size = 1000
chunk_overlap = 200
cache_dir = "__CACHE_DIR__/local_search/project_docs"
- Create your document directories:
- The
${DOCS_DIR}
variable points to a default location in your Documents folder - Documents are automatically indexed when the search is first used
- The
You can use local document search in several ways:
-
Auto-selection: Set
tool = "auto"
insettings.toml
[search] section -
Explicit collection: Set
tool = "project_docs"
to search only that collection -
All collections: Set
tool = "local_all"
to search across all collections -
Query syntax: Type
collection:project_docs your query
to target a specific collection
Edit settings.toml
to customize research parameters or use environment variables:
[search]
# Search tool to use (auto, wikipedia, arxiv, etc.)
tool = "auto"
# Number of research cycles
iterations = 2
# Questions generated per cycle
questions_per_iteration = 2
# Results per search query
max_results = 50
# Results after relevance filtering
max_filtered_results = 5
Using environment variables:
LDR_SEARCH__TOOL=auto
LDR_SEARCH__ITERATIONS=3
LDR_SEARCH__QUESTIONS_PER_ITERATION=2
The web interface offers several features:
- Dashboard: Start and manage research queries
- Real-time Updates: Track research progress
- Research History: Access past queries
- PDF Export: Download reports
- Research Management: Terminate processes or delete records
The CLI version allows you to:
- Choose between a quick summary or detailed report
- Enter your research query
- View results directly in the terminal
- Save reports automatically to the configured output directory
If you want to develop or modify the package, you can install it in development mode:
# Clone the repository
git clone https://github.com/LearningCircuit/local-deep-research.git
cd local-deep-research
# Install in development mode
pip install -e .
You can run the application directly using Python module syntax:
# Run the web interface
python -m local_deep_research.web.app
# Run the CLI version
python -m local_deep_research.main
Join our Discord server to exchange ideas, discuss usage patterns, and share research approaches.
This project is licensed under the MIT License.
- Built with Ollama for local AI processing
- Search powered by multiple sources:
- Wikipedia for factual knowledge
- arXiv for scientific papers
- PubMed for biomedical literature
- Semantic Scholar for academic literature
- DuckDuckGo for web search
- The Guardian for journalism
- SerpAPI for Google search results
- SearXNG for local web-search engine
- Brave Search for privacy-focused web search
- Built on LangChain framework
- Uses justext, Playwright, FAISS, and more
Support Free Knowledge: If you frequently use the search engines in this tool, please consider making a donation to these organizations:
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Make your changes
- Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) -
Important: Open a Pull Request against the
dev
branch, not themain
branch
We prefer all pull requests to be submitted against the dev
branch for easier testing and integration before releasing to the main branch.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for local-deep-research
Similar Open Source Tools

local-deep-research
Local Deep Research is a powerful AI-powered research assistant that performs deep, iterative analysis using multiple LLMs and web searches. It can be run locally for privacy or configured to use cloud-based LLMs for enhanced capabilities. The tool offers advanced research capabilities, flexible LLM support, rich output options, privacy-focused operation, enhanced search integration, and academic & scientific integration. It also provides a web interface, command line interface, and supports multiple LLM providers and search engines. Users can configure AI models, search engines, and research parameters for customized research experiences.

evalchemy
Evalchemy is a unified and easy-to-use toolkit for evaluating language models, focusing on post-trained models. It integrates multiple existing benchmarks such as RepoBench, AlpacaEval, and ZeroEval. Key features include unified installation, parallel evaluation, simplified usage, and results management. Users can run various benchmarks with a consistent command-line interface and track results locally or integrate with a database for systematic tracking and leaderboard submission.

gollama
Gollama is a delightful tool that brings Ollama, your offline conversational AI companion, directly into your terminal. It provides a fun and interactive way to generate responses from various models without needing internet connectivity. Whether you're brainstorming ideas, exploring creative writing, or just looking for inspiration, Gollama is here to assist you. The tool offers an interactive interface, customizable prompts, multiple models selection, and visual feedback to enhance user experience. It can be installed via different methods like downloading the latest release, using Go, running with Docker, or building from source. Users can interact with Gollama through various options like specifying a custom base URL, prompt, model, and enabling raw output mode. The tool supports different modes like interactive, piped, CLI with image, and TUI with image. Gollama relies on third-party packages like bubbletea, glamour, huh, and lipgloss. The roadmap includes implementing piped mode, support for extracting codeblocks, copying responses/codeblocks to clipboard, GitHub Actions for automated releases, and downloading models directly from Ollama using the rest API. Contributions are welcome, and the project is licensed under the MIT License.

mistral.rs
Mistral.rs is a fast LLM inference platform written in Rust. We support inference on a variety of devices, quantization, and easy-to-use application with an Open-AI API compatible HTTP server and Python bindings.

pr-pilot
PR Pilot is an AI-powered tool designed to assist users in their daily workflow by delegating routine work to AI with confidence and predictability. It integrates seamlessly with popular development tools and allows users to interact with it through a Command-Line Interface, Python SDK, REST API, and Smart Workflows. Users can automate tasks such as generating PR titles and descriptions, summarizing and posting issues, and formatting README files. The tool aims to save time and enhance productivity by providing AI-powered solutions for common development tasks.

rwkv.cpp
rwkv.cpp is a port of BlinkDL/RWKV-LM to ggerganov/ggml, supporting FP32, FP16, and quantized INT4, INT5, and INT8 inference. It focuses on CPU but also supports cuBLAS. The project provides a C library rwkv.h and a Python wrapper. RWKV is a large language model architecture with models like RWKV v5 and v6. It requires only state from the previous step for calculations, making it CPU-friendly on large context lengths. Users are advised to test all available formats for perplexity and latency on a representative dataset before serious use.

graphrag-visualizer
GraphRAG Visualizer is an application designed to visualize Microsoft GraphRAG artifacts by uploading parquet files generated from the GraphRAG indexing pipeline. Users can view and analyze data in 2D or 3D graphs, display data tables, search for specific nodes or relationships, and process artifacts locally for data security and privacy.

StableToolBench
StableToolBench is a new benchmark developed to address the instability of Tool Learning benchmarks. It aims to balance stability and reality by introducing features such as a Virtual API System with caching and API simulators, a new set of solvable queries determined by LLMs, and a Stable Evaluation System using GPT-4. The Virtual API Server can be set up either by building from source or using a prebuilt Docker image. Users can test the server using provided scripts and evaluate models with Solvable Pass Rate and Solvable Win Rate metrics. The tool also includes model experiments results comparing different models' performance.

vscode-i-dont-care-about-commit-message
This AI-powered git commit plugin for VSCode streamlines your commit and push processes, eliminating the need for manual confirmation. With a focus on minimizing keystrokes, the plugin leverages LLM to generate commit messages and automate the entire process. Key features include AI-assisted git commit and push, eliminating the need for the 'git add .' command, and customizable OpenAI model selection. The plugin supports multiple languages, making it accessible to developers worldwide. Additionally, it offers advanced settings for specifying the OpenAI API key, base URL, and conventional commit format. Developers can contribute to the project by following the provided development instructions.

ps-fuzz
The Prompt Fuzzer is an open-source tool that helps you assess the security of your GenAI application's system prompt against various dynamic LLM-based attacks. It provides a security evaluation based on the outcome of these attack simulations, enabling you to strengthen your system prompt as needed. The Prompt Fuzzer dynamically tailors its tests to your application's unique configuration and domain. The Fuzzer also includes a Playground chat interface, giving you the chance to iteratively improve your system prompt, hardening it against a wide spectrum of generative AI attacks.

vision-parse
Vision Parse is a tool that leverages Vision Language Models to parse PDF documents into beautifully formatted markdown content. It offers smart content extraction, content formatting, multi-LLM support, PDF document support, and local model hosting using Ollama. Users can easily convert PDFs to markdown with high precision and preserve document hierarchy and styling. The tool supports multiple Vision LLM providers like OpenAI, LLama, and Gemini for accuracy and speed, making document processing efficient and effortless.

recommendarr
Recommendarr is a tool that generates personalized TV show and movie recommendations based on your Sonarr, Radarr, Plex, and Jellyfin libraries using AI. It offers AI-powered recommendations, media server integration, flexible AI support, watch history analysis, customization options, and dark/light mode toggle. Users can connect their media libraries and watch history services, configure AI service settings, and get personalized recommendations based on genre, language, and mood/vibe preferences. The tool works with any OpenAI-compatible API and offers various recommended models for different cost options and performance levels. It provides personalized suggestions, detailed information, filter options, watch history analysis, and one-click adding of recommended content to Sonarr/Radarr.

TPI-LLM
TPI-LLM (Tensor Parallelism Inference for Large Language Models) is a system designed to bring LLM functions to low-resource edge devices, addressing privacy concerns by enabling LLM inference on edge devices with limited resources. It leverages multiple edge devices for inference through tensor parallelism and a sliding window memory scheduler to minimize memory usage. TPI-LLM demonstrates significant improvements in TTFT and token latency compared to other models, and plans to support infinitely large models with low token latency in the future.

StableToolBench
StableToolBench is a new benchmark developed to address the instability of Tool Learning benchmarks. It aims to balance stability and reality by introducing features like Virtual API System, Solvable Queries, and Stable Evaluation System. The benchmark ensures consistency through a caching system and API simulators, filters queries based on solvability using LLMs, and evaluates model performance using GPT-4 with metrics like Solvable Pass Rate and Solvable Win Rate.

llm
LLM is a Rust library that allows users to utilize multiple LLM backends (OpenAI, Anthropic, Ollama, DeepSeek, xAI, Phind, Groq, Google) in a single project. It provides a unified API and builder style for creating chat or text completion requests without the need for multiple structures and crates. Key features include multi-backend management, multi-step chains, templates for complex prompts, builder pattern for easy configuration, extensibility, validation, evaluation, parallel evaluation, function calling, REST API support, vision integration, and reasoning capabilities.

MockingBird
MockingBird is a toolbox designed for Mandarin speech synthesis using PyTorch. It supports multiple datasets such as aidatatang_200zh, magicdata, aishell3, and data_aishell. The toolbox can run on Windows, Linux, and M1 MacOS, providing easy and effective speech synthesis with pretrained encoder/vocoder models. It is webserver ready for remote calling. Users can train their own models or use existing ones for the encoder, synthesizer, and vocoder. The toolbox offers a demo video and detailed setup instructions for installation and model training.
For similar tasks

Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

sorrentum
Sorrentum is an open-source project that aims to combine open-source development, startups, and brilliant students to build machine learning, AI, and Web3 / DeFi protocols geared towards finance and economics. The project provides opportunities for internships, research assistantships, and development grants, as well as the chance to work on cutting-edge problems, learn about startups, write academic papers, and get internships and full-time positions at companies working on Sorrentum applications.

tidb
TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

zep-python
Zep is an open-source platform for building and deploying large language model (LLM) applications. It provides a suite of tools and services that make it easy to integrate LLMs into your applications, including chat history memory, embedding, vector search, and data enrichment. Zep is designed to be scalable, reliable, and easy to use, making it a great choice for developers who want to build LLM-powered applications quickly and easily.

telemetry-airflow
This repository codifies the Airflow cluster that is deployed at workflow.telemetry.mozilla.org (behind SSO) and commonly referred to as "WTMO" or simply "Airflow". Some links relevant to users and developers of WTMO: * The `dags` directory in this repository contains some custom DAG definitions * Many of the DAGs registered with WTMO don't live in this repository, but are instead generated from ETL task definitions in bigquery-etl * The Data SRE team maintains a WTMO Developer Guide (behind SSO)

mojo
Mojo is a new programming language that bridges the gap between research and production by combining Python syntax and ecosystem with systems programming and metaprogramming features. Mojo is still young, but it is designed to become a superset of Python over time.

pandas-ai
PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.

databend
Databend is an open-source cloud data warehouse that serves as a cost-effective alternative to Snowflake. With its focus on fast query execution and data ingestion, it's designed for complex analysis of the world's largest datasets.
For similar jobs

SLR-FC
This repository provides a comprehensive collection of AI tools and resources to enhance literature reviews. It includes a curated list of AI tools for various tasks, such as identifying research gaps, discovering relevant papers, visualizing paper content, and summarizing text. Additionally, the repository offers materials on generative AI, effective prompts, copywriting, image creation, and showcases of AI capabilities. By leveraging these tools and resources, researchers can streamline their literature review process, gain deeper insights from scholarly literature, and improve the quality of their research outputs.

paper-ai
Paper-ai is a tool that helps you write papers using artificial intelligence. It provides features such as AI writing assistance, reference searching, and editing and formatting tools. With Paper-ai, you can quickly and easily create high-quality papers.

paper-qa
PaperQA is a minimal package for question and answering from PDFs or text files, providing very good answers with in-text citations. It uses OpenAI Embeddings to embed and search documents, and follows a process of embedding docs and queries, searching for top passages, creating summaries, scoring and selecting relevant summaries, putting summaries into prompt, and generating answers. Users can customize prompts and use various models for embeddings and LLMs. The tool can be used asynchronously and supports adding documents from paths, files, or URLs.

ChatData
ChatData is a robust chat-with-documents application designed to extract information and provide answers by querying the MyScale free knowledge base or uploaded documents. It leverages the Retrieval Augmented Generation (RAG) framework, millions of Wikipedia pages, and arXiv papers. Features include self-querying retriever, VectorSQL, session management, and building a personalized knowledge base. Users can effortlessly navigate vast data, explore academic papers, and research documents. ChatData empowers researchers, students, and knowledge enthusiasts to unlock the true potential of information retrieval.

noScribe
noScribe is an AI-based software designed for automated audio transcription, specifically tailored for transcribing interviews for qualitative social research or journalistic purposes. It is a free and open-source tool that runs locally on the user's computer, ensuring data privacy. The software can differentiate between speakers and supports transcription in 99 languages. It includes a user-friendly editor for reviewing and correcting transcripts. Developed by Kai Dröge, a PhD in sociology with a background in computer science, noScribe aims to streamline the transcription process and enhance the efficiency of qualitative analysis.

AIStudyAssistant
AI Study Assistant is an app designed to enhance learning experience and boost academic performance. It serves as a personal tutor, lecture summarizer, writer, and question generator powered by Google PaLM 2. Features include interacting with an AI chatbot, summarizing lectures, generating essays, and creating practice questions. The app is built using 100% Kotlin, Jetpack Compose, Clean Architecture, and MVVM design pattern, with technologies like Ktor, Room DB, Hilt, and Kotlin coroutines. AI Study Assistant aims to provide comprehensive AI-powered assistance for students in various academic tasks.

data-to-paper
Data-to-paper is an AI-driven framework designed to guide users through the process of conducting end-to-end scientific research, starting from raw data to the creation of comprehensive and human-verifiable research papers. The framework leverages a combination of LLM and rule-based agents to assist in tasks such as hypothesis generation, literature search, data analysis, result interpretation, and paper writing. It aims to accelerate research while maintaining key scientific values like transparency, traceability, and verifiability. The framework is field-agnostic, supports both open-goal and fixed-goal research, creates data-chained manuscripts, involves human-in-the-loop interaction, and allows for transparent replay of the research process.

k2
K2 (GeoLLaMA) is a large language model for geoscience, trained on geoscience literature and fine-tuned with knowledge-intensive instruction data. It outperforms baseline models on objective and subjective tasks. The repository provides K2 weights, core data of GeoSignal, GeoBench benchmark, and code for further pretraining and instruction tuning. The model is available on Hugging Face for use. The project aims to create larger and more powerful geoscience language models in the future.