biomcp

BioMCP: Biomedical Model Context Protocol

Stars: 419

Visit

BioMCP is an open source toolkit that empowers AI assistants and agents with specialized biomedical knowledge. It follows the Model Context Protocol (MCP) to connect AI systems to authoritative biomedical data sources, enabling precise answers about clinical trials, scientific literature, and genomic variants. It provides structured access to various biomedical data sources, supports natural language queries, and functions as an MCP server for AI assistants and agents. BioMCP integrates with literature sources like PubMed and bioRxiv, clinical and genomic sources like ClinicalTrials.gov and MyVariant.info, and regulatory and safety sources like OpenFDA. It offers 24 specialized tools for biomedical research, including core tools like Think Tool, Search Tool, and Fetch Tool, as well as individual tools for articles, trials, variants, NCI-specific data, genes, diseases, and drugs. BioMCP can be run locally or as an HTTP server, supports Docker deployment, and provides a comprehensive CLI for database interaction.

README:

BioMCP: Biomedical Model Context Protocol

Version 0.7.2 is the final release of the Python-based BioMCP server. The project has been re-architected in Rust to be more agent-friendly — using fewer tokens, consuming less context window, running faster, and adding new data sources. The Python source code is preserved here under the v0.7.2 tag.

BioMCP is an open source (MIT License) toolkit that empowers AI assistants and agents with specialized biomedical knowledge. Built following the Model Context Protocol (MCP), it connects AI systems to authoritative biomedical data sources, enabling them to answer questions about clinical trials, scientific literature, and genomic variants with precision and depth.

MCPHub Certification

BioMCP is certified by MCPHub. This certification ensures that BioMCP follows best practices for Model Context Protocol implementation and provides reliable biomedical data access.

Why BioMCP?

While Large Language Models have broad general knowledge, they often lack specialized domain-specific information or access to up-to-date resources. BioMCP bridges this gap for biomedicine by:

Providing structured access to clinical trials, biomedical literature, and genomic variants
Enabling natural language queries to specialized databases without requiring knowledge of their specific syntax
Supporting biomedical research workflows through a consistent interface
Functioning as an MCP server for AI assistants and agents

Biomedical Data Sources

BioMCP integrates with multiple biomedical data sources:

Literature Sources

PubTator3/PubMed - Peer-reviewed biomedical literature with entity annotations
bioRxiv/medRxiv - Preprint servers for biology and health sciences
Europe PMC - Open science platform including preprints

Clinical & Genomic Sources

ClinicalTrials.gov - Clinical trial registry and results database
NCI Clinical Trials Search API - National Cancer Institute's curated cancer trials database
- Advanced search filters (biomarkers, prior therapies, brain metastases)
- Organization and intervention databases
- Disease vocabulary with synonyms
BioThings Suite - Comprehensive biomedical data APIs:
- MyVariant.info - Consolidated genetic variant annotation
- MyGene.info - Real-time gene annotations and information
- MyDisease.info - Disease ontology and synonym information
- MyChem.info - Drug/chemical annotations and properties
TCGA/GDC - The Cancer Genome Atlas for cancer variant data
1000 Genomes - Population frequency data via Ensembl
cBioPortal - Cancer genomics portal with mutation occurrence data
OncoKB - Precision oncology knowledge base for clinical variant interpretation (demo server with BRAF, ROS1, TP53)
- Therapeutic implications and FDA-approved treatments
- Oncogenicity and mutation effect annotations
- Works immediately without authentication

Regulatory & Safety Sources

OpenFDA - FDA regulatory and safety data:
- Drug Adverse Events (FAERS) - Post-market drug safety reports
- Drug Labels (SPL) - Official prescribing information
- Device Events (MAUDE) - Medical device adverse events, with genomic device filtering

Available MCP Tools

BioMCP provides 24 specialized tools for biomedical research:

Core Tools (3)

1. Think Tool (ALWAYS USE FIRST!)

CRITICAL: The think tool MUST be your first step for ANY biomedical research task.

# Start analysis with sequential thinking
think(
    thought="Breaking down the query about BRAF mutations in melanoma...",
    thoughtNumber=1,
    totalThoughts=3,
    nextThoughtNeeded=True
)

The sequential thinking tool helps:

Break down complex biomedical problems systematically
Plan multi-step research approaches
Track reasoning progress
Ensure comprehensive analysis

2. Search Tool

The search tool supports two modes:

Unified Query Language (Recommended)

Use the query parameter with structured field syntax for powerful cross-domain searches:

# Simple natural language
search(query="BRAF melanoma")

# Field-specific search
search(query="gene:BRAF AND trials.condition:melanoma")

# Complex queries
search(query="gene:BRAF AND variants.significance:pathogenic AND articles.date:>2023")

# Get searchable fields schema
search(get_schema=True)

# Explain how a query is parsed
search(query="gene:BRAF", explain_query=True)

Supported Fields:

Cross-domain: gene:, variant:, disease:
Trials: trials.condition:, trials.phase:, trials.status:, trials.intervention:
Articles: articles.author:, articles.journal:, articles.date:
Variants: variants.significance:, variants.rsid:, variants.frequency:

Domain-Based Search

Use the domain parameter with specific filters:

# Search articles (includes automatic cBioPortal integration)
search(domain="article", genes=["BRAF"], diseases=["melanoma"])

# Search with mutation-specific cBioPortal data
search(domain="article", genes=["BRAF"], keywords=["V600E"])
search(domain="article", genes=["SRSF2"], keywords=["F57*"])  # Wildcard patterns

# Search trials
search(domain="trial", conditions=["lung cancer"], phase="3")

# Search variants
search(domain="variant", gene="TP53", significance="pathogenic")

Note: When searching articles with a gene parameter, cBioPortal data is automatically included:

Gene-level summaries show mutation frequency across cancer studies
Mutation-specific searches (e.g., "V600E") show study-level occurrence data
Cancer types are dynamically resolved from cBioPortal API

3. Fetch Tool

Retrieve full details for a single article, trial, or variant:

# Fetch article details (supports both PMID and DOI)
fetch(domain="article", id="34567890")  # PMID
fetch(domain="article", id="10.1101/2024.01.20.23288905")  # DOI

# Fetch trial with all sections
fetch(domain="trial", id="NCT04280705", detail="all")

# Fetch variant details
fetch(domain="variant", id="rs113488022")

Domain-specific options:

Articles: detail="full" retrieves full text if available
Trials: detail can be "protocol", "locations", "outcomes", "references", or "all"
Variants: Always returns full details

Individual Tools (21)

For users who prefer direct access to specific functionality, BioMCP also provides 21 individual tools:

Article Tools (2)

article_searcher: Search PubMed/PubTator3 and preprints
article_getter: Fetch detailed article information (supports PMID and DOI)

Trial Tools (5)

trial_searcher: Search ClinicalTrials.gov or NCI CTS API (via source parameter)
trial_getter: Fetch all trial details from either source
trial_protocol_getter: Fetch protocol information only (ClinicalTrials.gov)
trial_references_getter: Fetch trial publications (ClinicalTrials.gov)
trial_outcomes_getter: Fetch outcome measures and results (ClinicalTrials.gov)
trial_locations_getter: Fetch site locations and contacts (ClinicalTrials.gov)

Variant Tools (2)

variant_searcher: Search MyVariant.info database
variant_getter: Fetch comprehensive variant details

NCI-Specific Tools (6)

nci_organization_searcher: Search NCI's organization database
nci_organization_getter: Get organization details by ID
nci_intervention_searcher: Search NCI's intervention database (drugs, devices, procedures)
nci_intervention_getter: Get intervention details by ID
nci_biomarker_searcher: Search biomarkers used in trial eligibility criteria
nci_disease_searcher: Search NCI's controlled vocabulary of cancer conditions

Gene, Disease & Drug Tools (3)

gene_getter: Get real-time gene information from MyGene.info
disease_getter: Get disease definitions and synonyms from MyDisease.info
drug_getter: Get drug/chemical information from MyChem.info

Note: All individual tools that search by gene automatically include cBioPortal summaries when the include_cbioportal parameter is True (default). Trial searches can expand disease conditions with synonyms when expand_synonyms is True (default).

Quick Start

For Claude Desktop Users

Install uv if you don't have it (recommended):

# MacOS
brew install uv

# Windows/Linux
pip install uv

Configure Claude Desktop:
- Open Claude Desktop settings
- Navigate to Developer section
- Click "Edit Config" and add:
```
{
  "mcpServers": {
    "biomcp": {
      "command": "uv",
      "args": ["run", "--with", "biomcp-python", "biomcp", "run"]
    }
  }
}
```
- Restart Claude Desktop and start chatting about biomedical topics!

Python Package Installation

# Using pip
pip install biomcp-python

# Using uv (recommended for faster installation)
uv pip install biomcp-python

# Run directly without installation
uv run --with biomcp-python biomcp trial search --condition "lung cancer"

Configuration

Environment Variables

BioMCP supports optional environment variables for enhanced functionality:

# cBioPortal API authentication (optional)
export CBIO_TOKEN="your-api-token"  # For authenticated access
export CBIO_BASE_URL="https://www.cbioportal.org/api"  # Custom API endpoint

# OncoKB demo server (optional - advanced users only)
# By default: Uses free demo server with BRAF, ROS1, TP53 (no setup required)
# For full gene access: Set ONCOKB_TOKEN from your OncoKB license
# export ONCOKB_TOKEN="your-oncokb-token"  # www.oncokb.org/account/settings

# Performance tuning
export BIOMCP_USE_CONNECTION_POOL="true"  # Enable HTTP connection pooling (default: true)
export BIOMCP_METRICS_ENABLED="false"     # Enable performance metrics (default: false)

Running BioMCP Server

BioMCP supports multiple transport protocols to suit different deployment scenarios:

Local Development (STDIO)

For direct integration with Claude Desktop or local MCP clients:

# Default STDIO mode for local development
biomcp run

# Or explicitly specify STDIO
biomcp run --mode stdio

HTTP Server Mode

BioMCP supports multiple HTTP transport protocols:

Legacy SSE Transport (Worker Mode)

For backward compatibility with existing SSE clients:

biomcp run --mode worker
# Server available at http://localhost:8000/sse

Streamable HTTP Transport (Recommended)

The new MCP-compliant Streamable HTTP transport provides optimal performance and standards compliance:

biomcp run --mode streamable_http

# Custom host and port
biomcp run --mode streamable_http --host 127.0.0.1 --port 8080

Features of Streamable HTTP transport:

Single /mcp endpoint for all operations
Dynamic response mode (JSON for quick operations, SSE for long-running)
Session management support (future)
Full MCP specification compliance (2025-03-26)
Better scalability for cloud deployments

Deployment Options

Docker

# Build the Docker image locally
docker build -t biomcp:latest .

# Run the container
docker run -p 8000:8000 biomcp:latest biomcp run --mode streamable_http

Cloudflare Workers

The worker mode can be deployed to Cloudflare Workers for global edge deployment.

Note: All APIs work without authentication, but tokens may provide higher rate limits.

Command Line Interface

BioMCP provides a comprehensive CLI for direct database interaction:

# Get help
biomcp --help

# Run the MCP server
biomcp run

# Article search examples
biomcp article search --gene BRAF --disease Melanoma  # Includes preprints by default
biomcp article search --gene BRAF --no-preprints      # Exclude preprints
biomcp article get 21717063 --full

# Clinical trial examples
biomcp trial search --condition "Lung Cancer" --phase PHASE3
biomcp trial search --condition melanoma --source nci --api-key YOUR_KEY  # Use NCI API
biomcp trial get NCT04280705 Protocol
biomcp trial get NCT04280705 --source nci --api-key YOUR_KEY  # Get from NCI

# Variant examples with external annotations
biomcp variant search --gene TP53 --significance pathogenic
biomcp variant get rs113488022  # Includes TCGA, 1000 Genomes, and cBioPortal data by default
biomcp variant get rs113488022 --no-external  # Core annotations only

# OncoKB integration (uses free demo server automatically)
biomcp variant search --gene BRAF --include-oncokb  # Works with BRAF, ROS1, TP53

# Gene information with functional enrichment
biomcp gene get TP53 --enrich pathway
biomcp gene get BRCA1 --enrich ontology
biomcp gene get EGFR --enrich celltypes

# NCI-specific examples (requires NCI API key)
biomcp organization search "MD Anderson" --api-key YOUR_KEY
biomcp organization get ORG123456 --api-key YOUR_KEY
biomcp intervention search pembrolizumab --api-key YOUR_KEY
biomcp intervention search --type Device --api-key YOUR_KEY
biomcp biomarker search "PD-L1" --api-key YOUR_KEY
biomcp disease search melanoma --source nci --api-key YOUR_KEY

Testing & Verification

Test your BioMCP setup with the MCP Inspector:

npx @modelcontextprotocol/inspector uv run --with biomcp-python biomcp run

This opens a web interface where you can explore and test all available tools.

Enterprise Version: OncoMCP

OncoMCP extends BioMCP with GenomOncology's enterprise-grade precision oncology platform (POP), providing:

HIPAA-Compliant Deployment: Secure on-premise options
Real-Time Trial Matching: Up-to-date status and arm-level matching
Healthcare Integration: Seamless EHR and data warehouse connectivity
Curated Knowledge Base: 15,000+ trials and FDA approvals
Sophisticated Patient Matching: Using integrated clinical and molecular profiles
Advanced NLP: Structured extraction from unstructured text
Comprehensive Biomarker Processing: Mutation and rule processing

Learn more: GenomOncology

MCP Registries

Example Use Cases

Gene Information Retrieval

# Get comprehensive gene information
gene_getter(gene_id_or_symbol="TP53")
# Returns: Official name, summary, aliases, links to databases

Disease Synonym Expansion

# Get disease information with synonyms
disease_getter(disease_id_or_name="GIST")
# Returns: "gastrointestinal stromal tumor" and other synonyms

# Search trials with automatic synonym expansion
trial_searcher(conditions=["GIST"], expand_synonyms=True)
# Searches for: GIST OR "gastrointestinal stromal tumor" OR "GI stromal tumor"

Integrated Biomedical Research

# 1. Always start with thinking
think(thought="Analyzing BRAF V600E in melanoma treatment", thoughtNumber=1)

# 2. Get gene context
gene_getter("BRAF")

# 3. Search for pathogenic variants with OncoKB clinical interpretation (uses free demo server)
variant_searcher(gene="BRAF", hgvsp="V600E", significance="pathogenic", include_oncokb=True)

# 4. Find relevant clinical trials with disease expansion
trial_searcher(conditions=["melanoma"], interventions=["BRAF inhibitor"])

Documentation

For comprehensive documentation, visit https://biomcp.org

Developer Guides

HTTP Client Guide - Using the centralized HTTP client
Migration Examples - Migrating from direct HTTP usage
Error Handling Guide - Comprehensive error handling patterns
Integration Testing Guide - Best practices for reliable integration tests
Third-Party Endpoints - Complete list of external APIs used
Testing Guide - Running tests and understanding test categories

Development

Running Tests

# Run all tests (including integration tests)
make test

# Run only unit tests (excluding integration tests)
uv run python -m pytest tests -m "not integration"

# Run only integration tests
uv run python -m pytest tests -m "integration"

Note: Integration tests make real API calls and may fail due to network issues or rate limiting. In CI/CD, integration tests are run separately and allowed to fail without blocking the build.

BioMCP Examples Repo

Looking to see BioMCP in action?

Check out the companion repository: 👉 biomcp-examples

It contains real prompts, AI-generated research briefs, and evaluation runs across different models. Use it to explore capabilities, compare outputs, or benchmark your own setup.

Have a cool example of your own? We’d love for you to contribute! Just fork the repo and submit a PR with your experiment.

License

This project is licensed under the MIT License.

For Tasks:

Click tags to check more tools for each tasks

analyze clinical trials search biomedical literature fetch article details search genetic variants retrieve disease information

For Jobs:

biomedical researcher clinical data analyst biomedical informatics specialist healthcare data scientist biomedical ai developer

Alternative AI tools for biomcp

Similar Open Source Tools

biomcp

github

: 419

evals

Strands Evals SDK is a comprehensive evaluation framework for AI agents and LLM applications. It provides tools for output validation, trajectory analysis, interaction evaluation, and automated experiment generation. Users can assess agent behavior, simulate multi-turn conversations, and generate test suites. The SDK offers custom evaluators, experiment management, and built-in scoring tools for comprehensive evaluation of AI systems.

github

: 72

OpenGradient-SDK

OpenGradient Python SDK is a tool for decentralized model management and inference services on the OpenGradient platform. It provides programmatic access to distributed AI infrastructure with cryptographic verification capabilities. The SDK supports verifiable LLM inference, multi-provider support, TEE execution, model hub integration, consensus-based verification, and command-line interface. Users can leverage this SDK to build AI applications with execution guarantees through Trusted Execution Environments and blockchain-based settlement, ensuring auditability and tamper-proof AI execution.

github

: 74

Biomni

Biomni is a general-purpose biomedical AI agent designed to autonomously execute a wide range of research tasks across diverse biomedical subfields. By integrating cutting-edge large language model (LLM) reasoning with retrieval-augmented planning and code-based execution, Biomni helps scientists dramatically enhance research productivity and generate testable hypotheses.

github

: 2.1k

orra

Orra is a tool for building production-ready multi-agent applications that handle complex real-world interactions. It coordinates tasks across existing stack, agents, and tools run as services using intelligent reasoning. With features like smart pre-evaluated execution plans, domain grounding, durable execution, and automatic service health monitoring, Orra enables users to go fast with tools as services and revert state to handle failures. It provides real-time status tracking and webhook result delivery, making it ideal for developers looking to move beyond simple crews and agents.

github

: 155

memento-mcp

Memento MCP is a scalable, high-performance knowledge graph memory system designed for LLMs. It offers semantic retrieval, contextual recall, and temporal awareness to any LLM client supporting the model context protocol. The system is built on core concepts like entities and relations, utilizing Neo4j as its storage backend for unified graph and vector search capabilities. With advanced features such as semantic search, temporal awareness, confidence decay, and rich metadata support, Memento MCP provides a robust solution for managing knowledge graphs efficiently and effectively.

github

: 217

science-codeevolve

CodeEvolve is an open-source framework that combines large language models with evolutionary algorithms to discover and optimize high-performing code solutions. It democratizes algorithmic discovery by making LLM-driven evolutionary search transparent, reproducible, and accessible. CodeEvolve provides a modular foundation for automated code synthesis guided by quantifiable metrics, addressing meta-optimization tasks where complex optimization problems need to be solved. The framework features islands-based genetic algorithm, modular evolutionary operators, quality-diversity optimization, flexible LLM integration, and distributed islands for efficient exploration. Core components include CLI entry point, process runner, evolution engine, program database, exploration schedulers, evaluator, islands coordinator, LLM interface, prompt sampler, and utilities. CodeEvolve demonstrates superior performance on algorithm-discovery benchmarks and is suitable for mathematical discovery, algorithm design, scientific discovery, and software optimization. Reproducibility and determinism are emphasized, with seedable internal algorithmic decisions. Contributions from the community are welcome, focusing on new selection policies, LLM integrations, benchmark problems, documentation, performance optimizations, and bug fixes.

github

: 55

MassGen

MassGen is a cutting-edge multi-agent system that leverages the power of collaborative AI to solve complex tasks. It assigns a task to multiple AI agents who work in parallel, observe each other's progress, and refine their approaches to converge on the best solution to deliver a comprehensive and high-quality result. The system operates through an architecture designed for seamless multi-agent collaboration, with key features including cross-model/agent synergy, parallel processing, intelligence sharing, consensus building, and live visualization. Users can install the system, configure API settings, and run MassGen for various tasks such as question answering, creative writing, research, development & coding tasks, and web automation & browser tasks. The roadmap includes plans for advanced agent collaboration, expanded model, tool & agent integration, improved performance & scalability, enhanced developer experience, and a web interface.

github

: 454

Mira

Mira is an agentic AI library designed for automating company research by gathering information from various sources like company websites, LinkedIn profiles, and Google Search. It utilizes a multi-agent architecture to collect and merge data points into a structured profile with confidence scores and clear source attribution. The core library is framework-agnostic and can be integrated into applications, pipelines, or custom workflows. Mira offers features such as real-time progress events, confidence scoring, company criteria matching, and built-in services for data gathering. The tool is suitable for users looking to streamline company research processes and enhance data collection efficiency.

github

: 63

code_puppy

Code Puppy is an AI-powered code generation agent designed to understand programming tasks, generate high-quality code, and explain its reasoning. It supports multi-language code generation, interactive CLI, and detailed code explanations. The tool requires Python 3.9+ and API keys for various models like GPT, Google's Gemini, Cerebras, and Claude. It also integrates with MCP servers for advanced features like code search and documentation lookups. Users can create custom JSON agents for specialized tasks and access a variety of tools for file management, code execution, and reasoning sharing.

github

: 322

haiku.rag

Haiku RAG is a Retrieval-Augmented Generation (RAG) library that utilizes LanceDB as a local vector database. It supports semantic and full-text search, hybrid search with Reciprocal Rank Fusion, multiple embedding and QA providers, default search result reranking, question answering, file monitoring, and various file formats. It can be used via CLI or Python API, and can serve as tools for AI assistants like Claude Desktop. The library offers features for document management and search, with detailed documentation available.

github

: 485

CyberStrikeAI

CyberStrikeAI is an AI-native security testing platform built in Go that integrates 100+ security tools, an intelligent orchestration engine, role-based testing with predefined security roles, a skills system with specialized testing skills, and comprehensive lifecycle management capabilities. It enables end-to-end automation from conversational commands to vulnerability discovery, attack-chain analysis, knowledge retrieval, and result visualization, delivering an auditable, traceable, and collaborative testing environment for security teams. The platform features an AI decision engine with OpenAI-compatible models, native MCP implementation with various transports, prebuilt tool recipes, large-result pagination, attack-chain graph, password-protected web UI, knowledge base with vector search, vulnerability management, batch task management, role-based testing, and skills system.

github

: 782

pilottai

PilottAI is a Python framework for building autonomous multi-agent systems with advanced orchestration capabilities. It provides enterprise-ready features for building scalable AI applications. The framework includes hierarchical agent systems, production-ready features like asynchronous processing and fault tolerance, advanced memory management with semantic storage, and integrations with multiple LLM providers and custom tools. PilottAI offers specialized agents for various tasks such as customer service, document processing, email handling, knowledge acquisition, marketing, research analysis, sales, social media, and web search. The framework also provides documentation, example use cases, and advanced features like memory management, load balancing, and fault tolerance.

github

: 219

llmxcpg

LLMxCPG is a framework for vulnerability detection using Code Property Graphs (CPG) and Large Language Models (LLM). It involves a two-phase process: Slice Construction where an LLM generates queries for a CPG to extract a code slice, and Vulnerability Detection where another LLM classifies the code slice as vulnerable or safe. The repository includes implementations of baseline models, information on datasets, scripts for running models, prompt templates, query generation examples, and configurations for fine-tuning models.

github

: 111

copilot-collections

Copilot Collections is an opinionated setup for GitHub Copilot tailored for delivery teams. It provides shared workflows, specialized agents, task prompts, reusable skills, and MCP integrations to streamline the software development process. The focus is on building features while letting Copilot handle the glue. The setup requires a GitHub Copilot Pro license and VS Code version 1.109 or later. It supports a standard workflow of Research, Plan, Implement, and Review, with specialized flows for UI-heavy tasks and end-to-end testing. Agents like Architect, Business Analyst, Software Engineer, UI Reviewer, Code Reviewer, and E2E Engineer assist in different stages of development. Skills like Task Analysis, Architecture Design, Codebase Analysis, Code Review, and E2E Testing provide specialized domain knowledge and workflows. The repository also includes prompts and chat commands for various tasks, along with instructions for installation and configuration in VS Code.

github

: 102

zotero-mcp

Zotero MCP seamlessly connects your Zotero research library with AI assistants like ChatGPT and Claude via the Model Context Protocol. It offers AI-powered semantic search, access to library content, PDF annotation extraction, and easy updates. Users can search their library, analyze citations, and get summaries, making it ideal for research tasks. The tool supports multiple embedding models, intelligent search results, and flexible access methods for both local and remote collaboration. With advanced features like semantic search and PDF annotation extraction, Zotero MCP enhances research efficiency and organization.

github

: 513

For similar tasks

biomcp

github

: 419

For similar jobs

biomcp

github

: 419

Me-LLaMA

Me LLaMA introduces a suite of open-source medical Large Language Models (LLMs), including Me LLaMA 13B/70B and their chat-enhanced versions. Developed through innovative continual pre-training and instruction tuning, these models leverage a vast medical corpus comprising PubMed papers, medical guidelines, and general domain data. Me LLaMA sets new benchmarks on medical reasoning tasks, making it a significant asset for medical NLP applications and research. The models are intended for computational linguistics and medical research, not for clinical decision-making without validation and regulatory approval.

github

: 126

KG_RAG

KG-RAG (Knowledge Graph-based Retrieval Augmented Generation) is a task agnostic framework that combines the explicit knowledge of a Knowledge Graph (KG) with the implicit knowledge of a Large Language Model (LLM). KG-RAG extracts "prompt-aware context" from a KG, which is defined as the minimal context sufficient enough to respond to the user prompt. This framework empowers a general-purpose LLM by incorporating an optimized domain-specific 'prompt-aware context' from a biomedical KG. KG-RAG is specifically designed for running prompts related to Diseases.

github

: 525

Scientific-LLM-Survey

Scientific Large Language Models (Sci-LLMs) is a repository that collects papers on scientific large language models, focusing on biology and chemistry domains. It includes textual, molecular, protein, and genomic languages, as well as multimodal language. The repository covers various large language models for tasks such as molecule property prediction, interaction prediction, protein sequence representation, protein sequence generation/design, DNA-protein interaction prediction, and RNA prediction. It also provides datasets and benchmarks for evaluating these models. The repository aims to facilitate research and development in the field of scientific language modeling.

github

: 261

biochatter

Generative AI models have shown tremendous usefulness in increasing accessibility and automation of a wide range of tasks. This repository contains the `biochatter` Python package, a generic backend library for the connection of biomedical applications to conversational AI. It aims to provide a common framework for deploying, testing, and evaluating diverse models and auxiliary technologies in the biomedical domain. BioChatter is part of the BioCypher ecosystem, connecting natively to BioCypher knowledge graphs.

github

: 135

cellseg_models.pytorch

cellseg-models.pytorch is a Python library built upon PyTorch for 2D cell/nuclei instance segmentation models. It provides multi-task encoder-decoder architectures and post-processing methods for segmenting cell/nuclei instances. The library offers high-level API to define segmentation models, open-source datasets for training, flexibility to modify model components, sliding window inference, multi-GPU inference, benchmarking utilities, regularization techniques, and example notebooks for training and finetuning models with different backbones.

github

: 69

aicsimageio

AICSImageIO is a Python tool for Image Reading, Metadata Conversion, and Image Writing for Microscopy Images. It supports various file formats like OME-TIFF, TIFF, ND2, DV, CZI, LIF, PNG, GIF, and Bio-Formats. Users can read and write metadata and imaging data, work with different file systems like local paths, HTTP URLs, s3fs, and gcsfs. The tool provides functionalities for full image reading, delayed image reading, mosaic image reading, metadata reading, xarray coordinate plane attachment, cloud IO support, and saving to OME-TIFF. It also offers benchmarking and developer resources.

github

: 198

ceLLama

ceLLama is a streamlined automation pipeline for cell type annotations using large-language models (LLMs). It operates locally to ensure privacy, provides comprehensive analysis by considering negative genes, offers efficient processing speed, and generates customized reports. Ideal for quick and preliminary cell type checks.

github

: 141