local-deep-research

Local Deep Research is an AI-powered assistant that transforms complex questions into comprehensive, cited reports by conducting iterative analysis using any LLM across diverse knowledge sources including academic databases, scientific repositories, web content, and private document collections.

Stars: 2046

Visit

Local Deep Research is a powerful AI-powered research assistant that performs deep, iterative analysis using multiple LLMs and web searches. It can be run locally for privacy or configured to use cloud-based LLMs for enhanced capabilities. The tool offers advanced research capabilities, flexible LLM support, rich output options, privacy-focused operation, enhanced search integration, and academic & scientific integration. It also provides a web interface, command line interface, and supports multiple LLM providers and search engines. Users can configure AI models, search engines, and research parameters for customized research experiences.

README:

Local Deep Research

Features

🔍 Advanced Research Capabilities
- Automated deep research with intelligent follow-up questions
- Proper inline citation and source verification
- Multi-iteration analysis for comprehensive coverage
- Full webpage content analysis (not just snippets)
🤖 Flexible LLM Support
- Local AI processing with Ollama models
- Cloud LLM support (Claude, GPT)
- Supports all Langchain models
- Configurable model selection based on needs
📊 Rich Output Options
- Detailed research findings with proper citations
- Well-structured comprehensive research reports
- Quick summaries for rapid insights
- Source tracking and verification
🔒 Privacy-Focused
- Runs entirely on your machine when using local models
- Configurable search settings
- Transparent data handling
🌐 Enhanced Search Integration
- Auto-selection of search sources: The "auto" search engine intelligently analyzes your query and selects the most appropriate search engine
- Multiple search engines including Wikipedia, arXiv, PubMed, Semantic Scholar, and more
- Local RAG search for private documents - search your own documents with vector embeddings
- Full webpage content retrieval and intelligent filtering
🎓 Academic & Scientific Integration
- Direct integration with PubMed, arXiv, Wikipedia, Semantic Scholar
- Properly formatted citations from academic sources
- Report structure suitable for literature reviews
- Cross-disciplinary synthesis of information

A powerful AI-powered research assistant that performs deep, iterative analysis using multiple LLMs and web searches. The system can be run locally for privacy or configured to use cloud-based LLMs for enhanced capabilities.

▶️ Watch Video

Important for non-academic searches: For normal web searches you will need SearXGN or an API key to a search provider like brave search or SerpAPI. The free searches are mostly academic search engines and will not help you for most normal searches.

Windows Installation

Download the Windows Installer for easy one-click installation.

Requires Ollama (or other model provider configured in .env). Download from https://ollama.ai and then pull a model ollama pull gemma3:12b

Quick Start (not required if installed with windows installer)

# Install the package
pip install local-deep-research

# Install required browser automation tools
playwright install

# For local models, install Ollama
# Download from https://ollama.ai and then pull a model
ollama pull gemma3:12b

Then run:

# Start the web interface (recommended)
ldr-web # (OR python -m local_deep_research.web.app)

# OR run the command line version
ldr # (OR python -m local_deep_research.main)

Access the web interface at http://127.0.0.1:5000 in your browser.

Docker Support

Build the image first if you haven't already

docker build -t local-deep-research .

Quick Docker Run

# Run with default settings (connects to Ollama running on the host)
docker run --network=host \
  -e LDR_LLM__PROVIDER="ollama" \
  -e LDR_LLM__MODEL="mistral" \
  local-deep-research

For comprehensive Docker setup information, see:

Programmatic Access

Local Deep Research now provides a simple API for programmatic access to its research capabilities:

import os
# Set environment variables to control the LLM
os.environ["LDR_LLM__MODEL"] = "mistral"     # Specify model name

from local_deep_research import quick_summary, generate_report, analyze_documents

# Generate a quick research summary with custom parameters
results = quick_summary(
    query="advances in fusion energy",
    search_tool="auto",          # Auto-select the best search engine
    iterations=1,                # Single research cycle for speed
    questions_per_iteration=2,   # Generate 2 follow-up questions
    max_results=30,              # Consider up to 30 search results
    temperature=0.7              # Control creativity of generation
)
print(results["summary"])

These functions provide flexible options for customizing the search parameters, iterations, and output formats. For more examples, see the programmatic access tutorial.

Configuration System

The package automatically creates and manages configuration files in your user directory:

Windows: Documents\LearningCircuit\local-deep-research\config\
Linux/Mac: ~/.config/local_deep_research/config/

Default Configuration Files

When you first run the tool, it creates these configuration files:

File	Purpose
`settings.toml`	General settings for research, web interface, and search
`llm_config.py`	Advanced LLM configuration (rarely needs modification)
`search_engines.toml`	Define and configure search engines
`local_collections.toml`	Configure local document collections for RAG
`.env`	Environment variables for configuration (recommended for API keys)

Note: For comprehensive environment variable configuration, see our Environment Variables Guide.

Setting Up AI Models

The system supports multiple LLM providers:

Local Models (via Ollama)

Install Ollama
Pull a model: ollama pull gemma3:12b (recommended model)
Ollama runs on port 11434 by default

Cloud Models

Add API keys to your environment variables (recommended) by creating a .env file in your config directory:

# Set API keys for cloud providers in .env
ANTHROPIC_API_KEY=your-api-key-here      # For Claude models
OPENAI_API_KEY=your-openai-key-here      # For GPT models
OPENAI_ENDPOINT_API_KEY=your-key-here    # For OpenRouter or similar services

# Set your preferred LLM provider and model (no need to edit llm_config.py)
LDR_LLM__PROVIDER=ollama                 # Options: ollama, openai, anthropic, etc.
LDR_LLM__MODEL=gemma3:12b                # Model name to use

Important: In most cases, you don't need to modify the llm_config.py file. Simply set the LDR_LLM__PROVIDER and LDR_LLM__MODEL environment variables to use your preferred model.

Supported LLM Providers

The system supports multiple LLM providers:

Provider	Type	API Key	Setup Details	Models
`OLLAMA`	Local	No	Install from ollama.ai	Mistral, Llama, Gemma, etc.
`OPENAI`	Cloud	`OPENAI_API_KEY`	Set in environment	GPT-3.5, GPT-4, GPT-4o
`ANTHROPIC`	Cloud	`ANTHROPIC_API_KEY`	Set in environment	Claude 3 Opus, Sonnet, Haiku
`OPENAI_ENDPOINT`	Cloud	`OPENAI_ENDPOINT_API_KEY`	Set in environment	Any OpenAI-compatible model
`VLLM`	Local	No	Requires GPU setup	Any supported by vLLM
`LMSTUDIO`	Local	No	Use LM Studio server	Models from LM Studio
`LLAMACPP`	Local	No	Configure model path	GGUF model formats

The OPENAI_ENDPOINT provider can access any service with an OpenAI-compatible API, including:

OpenRouter (access to hundreds of models)
Azure OpenAI
Together.ai
Groq
Anyscale
Self-hosted LLM servers with OpenAI compatibility

Setting Up Search Engines

Some search engines require API keys. Add them to your environment variables by creating a .env file in your config directory:

# Search engine API keys (add to .env file)
SERP_API_KEY=your-serpapi-key-here        # For Google results via SerpAPI
GOOGLE_PSE_API_KEY=your-google-key-here   # For Google Programmable Search
GOOGLE_PSE_ENGINE_ID=your-pse-id-here     # For Google Programmable Search
BRAVE_API_KEY=your-brave-search-key-here  # For Brave Search
GUARDIAN_API_KEY=your-guardian-key-here   # For The Guardian

# Set your preferred search tool
LDR_SEARCH__TOOL=auto                     # Default: intelligently selects best engine

Tip: To override other settings via environment variables (e.g., to change the web port), use: LDR_WEB__PORT=8080

Available Search Engines

Engine	Purpose	API Key Required?	Rate Limit
`auto`	Intelligently selects the best engine	No	Based on selected engine
`wikipedia`	General knowledge and facts	No	No strict limit
`arxiv`	Scientific papers and research	No	No strict limit
`pubmed`	Medical and biomedical research	No	No strict limit
`semantic_scholar`	Academic literature across all fields	No	100/5min
`github`	Code repositories and documentation	No	60/hour (unauthenticated)
`brave`	Web search (privacy-focused)	Yes	Based on plan
`serpapi`	Google search results	Yes	Based on plan
`google_pse`	Custom Google search	Yes	100/day free tier
`wayback`	Historical web content	No	No strict limit
`searxng`	Local web search engine	No (requires local server)	No limit
Any collection name	Search your local documents	No	No limit

Note: For detailed SearXNG setup, see our SearXNG Setup Guide.

Local Document Search (RAG)

The system can search through your local documents using vector embeddings.

Setting Up Document Collections

Define collections in local_collections.toml. Default collections include:

[project_docs]
name = "Project Documents"
description = "Project documentation and specifications"
paths = ["@format ${DOCS_DIR}/project_documents"]
enabled = true
embedding_model = "all-MiniLM-L6-v2"
embedding_device = "cpu"
embedding_model_type = "sentence_transformers"
max_results = 20
max_filtered_results = 5
chunk_size = 1000
chunk_overlap = 200
cache_dir = "__CACHE_DIR__/local_search/project_docs"

Create your document directories:
- The ${DOCS_DIR} variable points to a default location in your Documents folder
- Documents are automatically indexed when the search is first used

Using Local Search

You can use local document search in several ways:

Auto-selection: Set tool = "auto" in settings.toml [search] section
Explicit collection: Set tool = "project_docs" to search only that collection
All collections: Set tool = "local_all" to search across all collections
Query syntax: Type collection:project_docs your query to target a specific collection

Advanced Configuration

Research Parameters

Edit settings.toml to customize research parameters or use environment variables:

[search]
# Search tool to use (auto, wikipedia, arxiv, etc.)
tool = "auto"

# Number of research cycles
iterations = 2

# Questions generated per cycle
questions_per_iteration = 2

# Results per search query
max_results = 50

# Results after relevance filtering
max_filtered_results = 5

Using environment variables:

LDR_SEARCH__TOOL=auto
LDR_SEARCH__ITERATIONS=3
LDR_SEARCH__QUESTIONS_PER_ITERATION=2

Web Interface

The web interface offers several features:

Dashboard: Start and manage research queries
Real-time Updates: Track research progress
Research History: Access past queries
PDF Export: Download reports
Research Management: Terminate processes or delete records

Command Line Interface

The CLI version allows you to:

Choose between a quick summary or detailed report
Enter your research query
View results directly in the terminal
Save reports automatically to the configured output directory

Development Setup

If you want to develop or modify the package, you can install it in development mode:

# Clone the repository
git clone https://github.com/LearningCircuit/local-deep-research.git
cd local-deep-research

# Install in development mode
pip install -e .

You can run the application directly using Python module syntax:

# Run the web interface
python -m local_deep_research.web.app

# Run the CLI version
python -m local_deep_research.main

Community & Support

Join our Discord server to exchange ideas, discuss usage patterns, and share research approaches.

License

This project is licensed under the MIT License.

Acknowledgments

Built with Ollama for local AI processing
Search powered by multiple sources:
- Wikipedia for factual knowledge
- arXiv for scientific papers
- PubMed for biomedical literature
- Semantic Scholar for academic literature
- DuckDuckGo for web search
- The Guardian for journalism
- SerpAPI for Google search results
- SearXNG for local web-search engine
- Brave Search for privacy-focused web search
Built on LangChain framework
Uses justext, Playwright, FAISS, and more

Support Free Knowledge: If you frequently use the search engines in this tool, please consider making a donation to these organizations:

Donate to Wikipedia

Support arXiv

Donate to DuckDuckGo

Support PubMed/NCBI

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Make your changes
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Important: Open a Pull Request against the dev branch, not the main branch

We prefer all pull requests to be submitted against the dev branch for easier testing and integration before releasing to the main branch.

For Tasks:

Click tags to check more tools for each tasks

analyze data write reports conduct research generate insights verify sources

For Jobs:

research analyst academic writer data scientist journalist content creator

Alternative AI tools for local-deep-research

Similar Open Source Tools

local-deep-research

github

: 2.0k

evalchemy

Evalchemy is a unified and easy-to-use toolkit for evaluating language models, focusing on post-trained models. It integrates multiple existing benchmarks such as RepoBench, AlpacaEval, and ZeroEval. Key features include unified installation, parallel evaluation, simplified usage, and results management. Users can run various benchmarks with a consistent command-line interface and track results locally or integrate with a database for systematic tracking and leaderboard submission.

github

: 317

gollama

Gollama is a delightful tool that brings Ollama, your offline conversational AI companion, directly into your terminal. It provides a fun and interactive way to generate responses from various models without needing internet connectivity. Whether you're brainstorming ideas, exploring creative writing, or just looking for inspiration, Gollama is here to assist you. The tool offers an interactive interface, customizable prompts, multiple models selection, and visual feedback to enhance user experience. It can be installed via different methods like downloading the latest release, using Go, running with Docker, or building from source. Users can interact with Gollama through various options like specifying a custom base URL, prompt, model, and enabling raw output mode. The tool supports different modes like interactive, piped, CLI with image, and TUI with image. Gollama relies on third-party packages like bubbletea, glamour, huh, and lipgloss. The roadmap includes implementing piped mode, support for extracting codeblocks, copying responses/codeblocks to clipboard, GitHub Actions for automated releases, and downloading models directly from Ollama using the rest API. Contributions are welcome, and the project is licensed under the MIT License.

github

: 80

mistral.rs

Mistral.rs is a fast LLM inference platform written in Rust. We support inference on a variety of devices, quantization, and easy-to-use application with an Open-AI API compatible HTTP server and Python bindings.

github

: 5.4k

pr-pilot

PR Pilot is an AI-powered tool designed to assist users in their daily workflow by delegating routine work to AI with confidence and predictability. It integrates seamlessly with popular development tools and allows users to interact with it through a Command-Line Interface, Python SDK, REST API, and Smart Workflows. Users can automate tasks such as generating PR titles and descriptions, summarizing and posting issues, and formatting README files. The tool aims to save time and enhance productivity by providing AI-powered solutions for common development tasks.

github

: 149

rwkv.cpp

rwkv.cpp is a port of BlinkDL/RWKV-LM to ggerganov/ggml, supporting FP32, FP16, and quantized INT4, INT5, and INT8 inference. It focuses on CPU but also supports cuBLAS. The project provides a C library rwkv.h and a Python wrapper. RWKV is a large language model architecture with models like RWKV v5 and v6. It requires only state from the previous step for calculations, making it CPU-friendly on large context lengths. Users are advised to test all available formats for perplexity and latency on a representative dataset before serious use.

github

: 1.1k

graphrag-visualizer

GraphRAG Visualizer is an application designed to visualize Microsoft GraphRAG artifacts by uploading parquet files generated from the GraphRAG indexing pipeline. Users can view and analyze data in 2D or 3D graphs, display data tables, search for specific nodes or relationships, and process artifacts locally for data security and privacy.

github

: 301

StableToolBench

StableToolBench is a new benchmark developed to address the instability of Tool Learning benchmarks. It aims to balance stability and reality by introducing features such as a Virtual API System with caching and API simulators, a new set of solvable queries determined by LLMs, and a Stable Evaluation System using GPT-4. The Virtual API Server can be set up either by building from source or using a prebuilt Docker image. Users can test the server using provided scripts and evaluate models with Solvable Pass Rate and Solvable Win Rate metrics. The tool also includes model experiments results comparing different models' performance.

github

: 59

vscode-i-dont-care-about-commit-message

This AI-powered git commit plugin for VSCode streamlines your commit and push processes, eliminating the need for manual confirmation. With a focus on minimizing keystrokes, the plugin leverages LLM to generate commit messages and automate the entire process. Key features include AI-assisted git commit and push, eliminating the need for the 'git add .' command, and customizable OpenAI model selection. The plugin supports multiple languages, making it accessible to developers worldwide. Additionally, it offers advanced settings for specifying the OpenAI API key, base URL, and conventional commit format. Developers can contribute to the project by following the provided development instructions.

github

: 131

ps-fuzz

The Prompt Fuzzer is an open-source tool that helps you assess the security of your GenAI application's system prompt against various dynamic LLM-based attacks. It provides a security evaluation based on the outcome of these attack simulations, enabling you to strengthen your system prompt as needed. The Prompt Fuzzer dynamically tailors its tests to your application's unique configuration and domain. The Fuzzer also includes a Playground chat interface, giving you the chance to iteratively improve your system prompt, hardening it against a wide spectrum of generative AI attacks.

github

: 367

vision-parse

Vision Parse is a tool that leverages Vision Language Models to parse PDF documents into beautifully formatted markdown content. It offers smart content extraction, content formatting, multi-LLM support, PDF document support, and local model hosting using Ollama. Users can easily convert PDFs to markdown with high precision and preserve document hierarchy and styling. The tool supports multiple Vision LLM providers like OpenAI, LLama, and Gemini for accuracy and speed, making document processing efficient and effortless.

github

: 222

recommendarr

Recommendarr is a tool that generates personalized TV show and movie recommendations based on your Sonarr, Radarr, Plex, and Jellyfin libraries using AI. It offers AI-powered recommendations, media server integration, flexible AI support, watch history analysis, customization options, and dark/light mode toggle. Users can connect their media libraries and watch history services, configure AI service settings, and get personalized recommendations based on genre, language, and mood/vibe preferences. The tool works with any OpenAI-compatible API and offers various recommended models for different cost options and performance levels. It provides personalized suggestions, detailed information, filter options, watch history analysis, and one-click adding of recommended content to Sonarr/Radarr.

github

: 516

TPI-LLM

TPI-LLM (Tensor Parallelism Inference for Large Language Models) is a system designed to bring LLM functions to low-resource edge devices, addressing privacy concerns by enabling LLM inference on edge devices with limited resources. It leverages multiple edge devices for inference through tensor parallelism and a sliding window memory scheduler to minimize memory usage. TPI-LLM demonstrates significant improvements in TTFT and token latency compared to other models, and plans to support infinitely large models with low token latency in the future.

github

: 123

StableToolBench

StableToolBench is a new benchmark developed to address the instability of Tool Learning benchmarks. It aims to balance stability and reality by introducing features like Virtual API System, Solvable Queries, and Stable Evaluation System. The benchmark ensures consistency through a caching system and API simulators, filters queries based on solvability using LLMs, and evaluates model performance using GPT-4 with metrics like Solvable Pass Rate and Solvable Win Rate.

github

: 135

llm

LLM is a Rust library that allows users to utilize multiple LLM backends (OpenAI, Anthropic, Ollama, DeepSeek, xAI, Phind, Groq, Google) in a single project. It provides a unified API and builder style for creating chat or text completion requests without the need for multiple structures and crates. Key features include multi-backend management, multi-step chains, templates for complex prompts, builder pattern for easy configuration, extensibility, validation, evaluation, parallel evaluation, function calling, REST API support, vision integration, and reasoning capabilities.

github

: 53

MockingBird

MockingBird is a toolbox designed for Mandarin speech synthesis using PyTorch. It supports multiple datasets such as aidatatang_200zh, magicdata, aishell3, and data_aishell. The toolbox can run on Windows, Linux, and M1 MacOS, providing easy and effective speech synthesis with pretrained encoder/vocoder models. It is webserver ready for remote calling. Users can train their own models or use existing ones for the encoder, synthesizer, and vocoder. The toolbox offers a demo video and detailed setup instructions for installation and model training.

github

: 35.1k

For similar tasks

Azure-Analytics-and-AI-Engagement

The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

github

: 136

sorrentum

Sorrentum is an open-source project that aims to combine open-source development, startups, and brilliant students to build machine learning, AI, and Web3 / DeFi protocols geared towards finance and economics. The project provides opportunities for internships, research assistantships, and development grants, as well as the chance to work on cutting-edge problems, learn about startups, write academic papers, and get internships and full-time positions at companies working on Sorrentum applications.

github

: 89

tidb

TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

github

: 37.1k

zep-python

Zep is an open-source platform for building and deploying large language model (LLM) applications. It provides a suite of tools and services that make it easy to integrate LLMs into your applications, including chat history memory, embedding, vector search, and data enrichment. Zep is designed to be scalable, reliable, and easy to use, making it a great choice for developers who want to build LLM-powered applications quickly and easily.

github

: 60

telemetry-airflow

This repository codifies the Airflow cluster that is deployed at workflow.telemetry.mozilla.org (behind SSO) and commonly referred to as "WTMO" or simply "Airflow". Some links relevant to users and developers of WTMO: * The `dags` directory in this repository contains some custom DAG definitions * Many of the DAGs registered with WTMO don't live in this repository, but are instead generated from ETL task definitions in bigquery-etl * The Data SRE team maintains a WTMO Developer Guide (behind SSO)

github

: 185

mojo

Mojo is a new programming language that bridges the gap between research and production by combining Python syntax and ecosystem with systems programming and metaprogramming features. Mojo is still young, but it is designed to become a superset of Python over time.

github

: 23.0k

pandas-ai

PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.

github

: 14.0k

databend

Databend is an open-source cloud data warehouse that serves as a cost-effective alternative to Snowflake. With its focus on fast query execution and data ingestion, it's designed for complex analysis of the world's largest datasets.

github

: 7.7k

For similar jobs

SLR-FC

This repository provides a comprehensive collection of AI tools and resources to enhance literature reviews. It includes a curated list of AI tools for various tasks, such as identifying research gaps, discovering relevant papers, visualizing paper content, and summarizing text. Additionally, the repository offers materials on generative AI, effective prompts, copywriting, image creation, and showcases of AI capabilities. By leveraging these tools and resources, researchers can streamline their literature review process, gain deeper insights from scholarly literature, and improve the quality of their research outputs.

github

: 131

paper-ai

Paper-ai is a tool that helps you write papers using artificial intelligence. It provides features such as AI writing assistance, reference searching, and editing and formatting tools. With Paper-ai, you can quickly and easily create high-quality papers.

github

: 664

paper-qa

PaperQA is a minimal package for question and answering from PDFs or text files, providing very good answers with in-text citations. It uses OpenAI Embeddings to embed and search documents, and follows a process of embedding docs and queries, searching for top passages, creating summaries, scoring and selecting relevant summaries, putting summaries into prompt, and generating answers. Users can customize prompts and use various models for embeddings and LLMs. The tool can be used asynchronously and supports adding documents from paths, files, or URLs.

github

: 3.6k

ChatData

ChatData is a robust chat-with-documents application designed to extract information and provide answers by querying the MyScale free knowledge base or uploaded documents. It leverages the Retrieval Augmented Generation (RAG) framework, millions of Wikipedia pages, and arXiv papers. Features include self-querying retriever, VectorSQL, session management, and building a personalized knowledge base. Users can effortlessly navigate vast data, explore academic papers, and research documents. ChatData empowers researchers, students, and knowledge enthusiasts to unlock the true potential of information retrieval.

github

: 135

noScribe

noScribe is an AI-based software designed for automated audio transcription, specifically tailored for transcribing interviews for qualitative social research or journalistic purposes. It is a free and open-source tool that runs locally on the user's computer, ensuring data privacy. The software can differentiate between speakers and supports transcription in 99 languages. It includes a user-friendly editor for reviewing and correcting transcripts. Developed by Kai Dröge, a PhD in sociology with a background in computer science, noScribe aims to streamline the transcription process and enhance the efficiency of qualitative analysis.

github

: 655

AIStudyAssistant

AI Study Assistant is an app designed to enhance learning experience and boost academic performance. It serves as a personal tutor, lecture summarizer, writer, and question generator powered by Google PaLM 2. Features include interacting with an AI chatbot, summarizing lectures, generating essays, and creating practice questions. The app is built using 100% Kotlin, Jetpack Compose, Clean Architecture, and MVVM design pattern, with technologies like Ktor, Room DB, Hilt, and Kotlin coroutines. AI Study Assistant aims to provide comprehensive AI-powered assistance for students in various academic tasks.

github

: 69

data-to-paper

Data-to-paper is an AI-driven framework designed to guide users through the process of conducting end-to-end scientific research, starting from raw data to the creation of comprehensive and human-verifiable research papers. The framework leverages a combination of LLM and rule-based agents to assist in tasks such as hypothesis generation, literature search, data analysis, result interpretation, and paper writing. It aims to accelerate research while maintaining key scientific values like transparency, traceability, and verifiability. The framework is field-agnostic, supports both open-goal and fixed-goal research, creates data-chained manuscripts, involves human-in-the-loop interaction, and allows for transparent replay of the research process.

github

: 553

k2

K2 (GeoLLaMA) is a large language model for geoscience, trained on geoscience literature and fine-tuned with knowledge-intensive instruction data. It outperforms baseline models on objective and subjective tasks. The repository provides K2 weights, core data of GeoSignal, GeoBench benchmark, and code for further pretraining and instruction tuning. The model is available on Hugging Face for use. The project aims to create larger and more powerful geoscience language models in the future.

github

: 153