NoLabs
Open source biolab
Stars: 75
NoLabs is an open-source biolab that provides easy access to state-of-the-art models for bio research. It supports various tasks, including drug discovery, protein analysis, and small molecule design. NoLabs aims to accelerate bio research by making inference models accessible to everyone.
README:
NoLabs is an open source biolab that lets you run experiments with the latest state-of-the-art models for bio research.
The goal of the project is to accelerate bio research by making inference models easy to use for everyone. We are currently supporting protein biolab (predicting useful protein properties such as solubility, localisation, gene ontology, folding, etc.), drug discovery biolab (construct ligands and test binding to target proteins) and small molecules design biolab (design small molecules given a protein target and check drug-likeness and binding affinity).
We are working on expanding both and adding a cell biolab and genetic biolab, and we will appreciate your support and contributions.
Let's accelerate bio research!
Bio Buddy - drug discovery co-pilot:
BioBuddy is a drug discovery copilot that supports:
- Downloading data from ChemBL
- Downloading data from RcsbPDB
- Questions about drug discovery process, targets, chemical components etc
- Writing review reports based on published papers
For example, you can ask
- "Can you pull me some latest approved drugs?"
- "Can you download me 1000 rhodopsins?"
- "How does an aspirin molecule look like?" and it will do this and answer other questions.
To enable biobuddy run this command when starting nolabs:
$ ENABLE_BIOBUDDY=true docker compose up nolabs
And also start the biobuddy microservice:
$ OPENAI_API_KEY=your_openai_api_key TAVILY_API_KEY=your_tavily_api_key docker compose up biobuddy
Nolabs is running on GPT4 for the best performance. You can adjust the model you use in microservices/biobuddy/biobuddy/services.py
You can ignore OPENAI_API_KEY warnings when running other services using docker compose.
Drug discovery lab:
- Drug-target interaction prediction, high throughput virtual screening (HTVS) based on:
- Automatic pocket prediction via P2Rank
- Automatic MSA generation via HH-suite3
Protein lab:
- Prediction of subcellular localisation via fine-tuned ritakurban/ESM_protein_localization model (to be updated with a better model)
- Prediction of folded structure via facebook/esmfold_v1
- Gene ontology prediction for 200 most popular gene ontologies
- Protein solubility prediction
Protein design Lab:
- Protein generation via RFDiffusion
Conformations Lab:
Small molecules design lab:
- Small molecules design using a protein target with drug-likeness scoring component REINVENT4
Specify the search space (location) where designed molecule would bind relative to protein target. Then run reinforcement learning to generate new molecules in specified binding region.
WARNING: Reinforcement learning process might take a long time (with 128 molecules per 1 epoch and 50 epochs it could take a day)
# Clone this project
$ git clone https://github.com/BasedLabs/nolabs
$ cd nolabs
Generate a new token for docker registry https://github.com/settings/tokens/new Select 'read:packages'
$ docker login ghcr.io -u username -p ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
If you want to run a single feature (recommended)
$ docker compose up nolabs
$ docker compose up diffdock
$ docker compose up p2rank
...
OR if you want to run everything on one machine:
$ docker compose up
Server will be available on http://localhost:9000
nolabs
:
-
Create a Python Environment with Python 3.11
- First, ensure you have Python 3.11 installed. If not, you can download it from python.org or use a version manager like
pyenv
. - Create a new virtual environment:
python3.11 -m venv nolabs-env
- First, ensure you have Python 3.11 installed. If not, you can download it from python.org or use a version manager like
-
Activate the Virtual Environment and Install Poetry
- Activate the virtual environment:
source nolabs-env/bin/activate
- Install Poetry, a tool for dependency management and packaging in Python. You can install it with pip:
pip install poetry uvicorn
- Activate the virtual environment:
-
Install Dependencies Using Poetry
poetry install
-
Start a Uvicorn Server
- Set your environment variable and start the Uvicorn server with the following command:
NOLABS_ENVIRONMENT=dev poetry run uvicorn nolabs.api:app --host=127.0.0.1 --port=8000
- This command runs the
nolabs
API server onlocalhost
at port8000
.
- Set your environment variable and start the Uvicorn server with the following command:
-
Set Up the Frontend
- In a separate terminal, ensure you have
npm
installed. If not, you can install Node.js andnpm
from nodejs.org. - Run
npm install
to install the necessary Node.js packages:npm install
- In a separate terminal, ensure you have
- After installing the packages, start the frontend development server:
npm run dev
Server will be available on http://localhost:9000
We provide individual Docker containers backed by FastAPI for each feature, which are available in the /microservices
folder. You can use them individually as APIs.
For example, to run the esmfold
service, you can use Docker Compose:
$ docker compose up esmfold
Once the service is up, you can make a POST request to perform a task, such as predicting a protein's folded structure. Here's a simple Python example:
import requests
# Define the API endpoint
url = 'http://127.0.0.1:5736/run-folding'
# Specify the protein sequence in the request body
data = {
'protein_sequence': 'YOUR_PROTEIN_SEQUENCE_HERE'
}
# Make the POST request and get the response
response = requests.post(url, json=data)
# Extract the PDB content from the response
pdb_content = response.json().get('pdb_content', '')
print(pdb_content)
This Python script makes a POST request to the esmfold microservice with a protein sequence and prints the predicted PDB content.
Since we provide individual Docker containers backed by FastAPI for each feature, available in the /microservices
folder, you can run them on separate machines. This setup is particularly useful if you're developing on a computer
without GPU support but have access to a VM with a GPU for tasks like folding, docking, etc.
For instance, to run the diffdock
service, use Docker Compose on the VM or computer equipped with a GPU.
On your server/VM/computer with a GPU, run:
$ docker compose up diffdock
Once the service is up, you can check that you can access it from your computer by navigating to http://< gpu_machine_ip>:5737/docs
If everything is correct, you should see the FastAPI page with diffdock's API surface like this:
Next, update the nolabs/infrastructure/settings.ini file on your primary machine to include the IP address of the service (replace 127.0.0.1 with your GPU machine's IP):
...
p2rank = http://127.0.0.1:5731
esmfold = http://127.0.0.1:5736
esmfold_light = http://127.0.0.1:5733
msa_light = http://127.0.0.1:5734
umol = http://127.0.0.1:5735
diffdock = http://127.0.0.1:5737 -> http://74.82.28.227:5737
...
And now you are ready to use this service hosted on a separate machine!
Model: RFdiffusion
RFdiffusion is an open source method for structure generation, with or without conditional information (a motif, target etc).
docker compose up protein_design
Swagger UI will be available on http://localhost:5789/docs
or install as a python package
Model: ESMFold - Evolutionary Scale Modeling
docker compose up esmfold
Swagger UI will be available on http://localhost:5736/docs
or install as a python package
Model: ESMAtlas
docker compose up esmfold_light
Swagger UI will be available on http://localhost:5733/docs
or install as a python package
Model: Hugging Face
docker compose up gene_ontology
Swagger UI will be available on http://localhost:5788/docs
or install as a python package
Model: Hugging Face
docker compose up localisation
Swagger UI will be available on http://localhost:5787/docs
or install as a python package
Model: p2rank
docker compose up p2rank
Swagger UI will be available on http://localhost:5731/docs
or install as a python package
Model: Hugging Face
docker compose up solubility
Swagger UI will be available on http://localhost:5786/docs
Model: UMol
docker compose up umol
Swagger UI will be available on http://localhost:5735/docs
Model: RoseTTAFold
docker compose up rosettafold
Swagger UI will be available on http://localhost:5738/docs
WARNING: To use Rosettafold you must change the volumes '.' to point to the specified folders.
Model: REINVENT4
Misc: DockStream, QED, AutoDock Vina
docker compose up reinvent
Swagger UI will be available on http://localhost:5790/docs
WARNING: Do not change the number of guvicorn workers (1), this will lead to microservice issues.
The following tools were used in this project:
[Recommended for laptops] If you are using a laptop, use --test
argument (no need to have a lot of compute):
- RAM > 16GB
- [Optional] GPU memory >= 16GB (REALLY speeds up the inference)
[Recommended for powerful workstations] Else, if you want to host everything on your machine and have faster inference (also a requirement for folding sequences > 400 amino acids in length):
- RAM > 30GB
- [Optional] GPU memory >= 40GB (REALLY speeds up the inference)
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for NoLabs
Similar Open Source Tools
NoLabs
NoLabs is an open-source biolab that provides easy access to state-of-the-art models for bio research. It supports various tasks, including drug discovery, protein analysis, and small molecule design. NoLabs aims to accelerate bio research by making inference models accessible to everyone.
gitingest
GitIngest is a tool that allows users to turn any Git repository into a prompt-friendly text ingest for LLMs. It provides easy code context by generating a text digest from a git repository URL or directory. The tool offers smart formatting for optimized output format for LLM prompts and provides statistics about file and directory structure, size of the extract, and token count. GitIngest can be used as a CLI tool on Linux and as a Python package for code integration. The tool is built using Tailwind CSS for frontend, FastAPI for backend framework, tiktoken for token estimation, and apianalytics.dev for simple analytics. Users can self-host GitIngest by building the Docker image and running the container. Contributions to the project are welcome, and the tool aims to be beginner-friendly for first-time contributors with a simple Python and HTML codebase.
patchwork
PatchWork is an open-source framework designed for automating development tasks using large language models. It enables users to automate workflows such as PR reviews, bug fixing, security patching, and more through a self-hosted CLI agent and preferred LLMs. The framework consists of reusable atomic actions called Steps, customizable LLM prompts known as Prompt Templates, and LLM-assisted automations called Patchflows. Users can run Patchflows locally in their CLI/IDE or as part of CI/CD pipelines. PatchWork offers predefined patchflows like AutoFix, PRReview, GenerateREADME, DependencyUpgrade, and ResolveIssue, with the flexibility to create custom patchflows. Prompt templates are used to pass queries to LLMs and can be customized. Contributions to new patchflows, steps, and the core framework are encouraged, with chat assistants available to aid in the process. The roadmap includes expanding the patchflow library, introducing a debugger and validation module, supporting large-scale code embeddings, parallelization, fine-tuned models, and an open-source GUI. PatchWork is licensed under AGPL-3.0 terms, while custom patchflows and steps can be shared using the Apache-2.0 licensed patchwork template repository.
Easy-Translate
Easy-Translate is a script designed for translating large text files with a single command. It supports various models like M2M100, NLLB200, SeamlessM4T, LLaMA, and Bloom. The tool is beginner-friendly and offers seamless and customizable features for advanced users. It allows acceleration on CPU, multi-CPU, GPU, multi-GPU, and TPU, with support for different precisions and decoding strategies. Easy-Translate also provides an evaluation script for translations. Built on HuggingFace's Transformers and Accelerate library, it supports prompt usage and loading huge models efficiently.
open-parse
Open Parse is a Python library for visually discerning document layouts and chunking them effectively. It is designed to fill the gap in open-source libraries for handling complex documents. Unlike text splitting, which converts a file to raw text and slices it up, Open Parse visually analyzes documents for superior LLM input. It also supports basic markdown for parsing headings, bold, and italics, and has high-precision table support, extracting tables into clean Markdown formats with accuracy that surpasses traditional tools. Open Parse is extensible, allowing users to easily implement their own post-processing steps. It is also intuitive, with great editor support and completion everywhere, making it easy to use and learn.
linkedin-api
The Linkedin API for Python allows users to programmatically search profiles, send messages, and find jobs using a regular Linkedin user account. It does not require 'official' API access, just a valid Linkedin account. However, it is important to note that this library is not officially supported by LinkedIn and using it may violate LinkedIn's Terms of Service. Users can authenticate using any Linkedin account credentials and access features like getting profiles, profile contact info, and connections. The library also provides commercial alternatives for extracting data, scraping public profiles, and accessing a full LinkedIn API. It is not endorsed or supported by LinkedIn and is intended for educational purposes and personal use only.
GraphRAG-Local-UI
GraphRAG Local with Interactive UI is an adaptation of Microsoft's GraphRAG, tailored to support local models and featuring a comprehensive interactive user interface. It allows users to leverage local models for LLM and embeddings, visualize knowledge graphs in 2D or 3D, manage files, settings, and queries, and explore indexing outputs. The tool aims to be cost-effective by eliminating dependency on costly cloud-based models and offers flexible querying options for global, local, and direct chat queries.
middleware
Middleware is an open-source engineering management tool that helps engineering leaders measure and analyze team effectiveness using DORA metrics. It integrates with CI/CD tools, automates DORA metric collection and analysis, visualizes key performance indicators, provides customizable reports and dashboards, and integrates with project management platforms. Users can set up Middleware using Docker or manually, generate encryption keys, set up backend and web servers, and access the application to view DORA metrics. The tool calculates DORA metrics using GitHub data, including Deployment Frequency, Lead Time for Changes, Mean Time to Restore, and Change Failure Rate. Middleware aims to provide DORA metrics to users based on their Git data, simplifying the process of tracking software delivery performance and operational efficiency.
air-light
Air-light is a minimalist WordPress starter theme designed to be an ultra minimal starting point for a WordPress project. It is built to be very straightforward, backwards compatible, front-end developer friendly and modular by its structure. Air-light is free of weird "app-like" folder structures or odd syntaxes that nobody else uses. It loves WordPress as it was and as it is.
bedrock-claude-chat
This repository is a sample chatbot using the Anthropic company's LLM Claude, one of the foundational models provided by Amazon Bedrock for generative AI. It allows users to have basic conversations with the chatbot, personalize it with their own instructions and external knowledge, and analyze usage for each user/bot on the administrator dashboard. The chatbot supports various languages, including English, Japanese, Korean, Chinese, French, German, and Spanish. Deployment is straightforward and can be done via the command line or by using AWS CDK. The architecture is built on AWS managed services, eliminating the need for infrastructure management and ensuring scalability, reliability, and security.
Upscaler
Holloway's Upscaler is a consolidation of various compiled open-source AI image/video upscaling products for a CLI-friendly image and video upscaling program. It provides low-cost AI upscaling software that can run locally on a laptop, programmable for albums and videos, reliable for large video files, and works without GUI overheads. The repository supports hardware testing on various systems and provides important notes on GPU compatibility, video types, and image decoding bugs. Dependencies include ffmpeg and ffprobe for video processing. The user manual covers installation, setup pathing, calling for help, upscaling images and videos, and contributing back to the project. Benchmarks are provided for performance evaluation on different hardware setups.
any-parser
AnyParser provides an API to accurately extract unstructured data (e.g., PDFs, images, charts) into a structured format. Users can set up their API key, run synchronous and asynchronous extractions, and perform batch extraction. The tool is useful for extracting text, numbers, and symbols from various sources like PDFs and images. It offers flexibility in processing data and provides immediate results for synchronous extraction while allowing users to fetch results later for asynchronous and batch extraction. AnyParser is designed to simplify data extraction tasks and enhance data processing efficiency.
NekoImageGallery
NekoImageGallery is an online AI image search engine that utilizes the Clip model and Qdrant vector database. It supports keyword search and similar image search. The tool generates 768-dimensional vectors for each image using the Clip model, supports OCR text search using PaddleOCR, and efficiently searches vectors using the Qdrant vector database. Users can deploy the tool locally or via Docker, with options for metadata storage using Qdrant database or local file storage. The tool provides API documentation through FastAPI's built-in Swagger UI and can be used for tasks like image search, text extraction, and vector search.
Protofy
Protofy is a full-stack, batteries-included low-code enabled web/app and IoT system with an API system and real-time messaging. It is based on Protofy (protoflow + visualui + protolib + protodevices) + Expo + Next.js + Tamagui + Solito + Express + Aedes + Redbird + Many other amazing packages. Protofy can be used to fast prototype Apps, webs, IoT systems, automations, or APIs. It is a ultra-extensible CMS with supercharged capabilities, mobile support, and IoT support (esp32 thanks to esphome).
RAVE
RAVE is a variational autoencoder for fast and high-quality neural audio synthesis. It can be used to generate new audio samples from a given dataset, or to modify the style of existing audio samples. RAVE is easy to use and can be trained on a variety of audio datasets. It is also computationally efficient, making it suitable for real-time applications.
For similar tasks
NoLabs
NoLabs is an open-source biolab that provides easy access to state-of-the-art models for bio research. It supports various tasks, including drug discovery, protein analysis, and small molecule design. NoLabs aims to accelerate bio research by making inference models accessible to everyone.
For similar jobs
NoLabs
NoLabs is an open-source biolab that provides easy access to state-of-the-art models for bio research. It supports various tasks, including drug discovery, protein analysis, and small molecule design. NoLabs aims to accelerate bio research by making inference models accessible to everyone.
OpenCRISPR
OpenCRISPR is a set of free and open gene editing systems designed by Profluent Bio. The OpenCRISPR-1 protein maintains the prototypical architecture of a Type II Cas9 nuclease but is hundreds of mutations away from SpCas9 or any other known natural CRISPR-associated protein. You can view OpenCRISPR-1 as a drop-in replacement for many protocols that need a cas9-like protein with an NGG PAM and you can even use it with canonical SpCas9 gRNAs. OpenCRISPR-1 can be fused in a deactivated or nickase format for next generation gene editing techniques like base, prime, or epigenome editing.
ersilia
The Ersilia Model Hub is a unified platform of pre-trained AI/ML models dedicated to infectious and neglected disease research. It offers an open-source, low-code solution that provides seamless access to AI/ML models for drug discovery. Models housed in the hub come from two sources: published models from literature (with due third-party acknowledgment) and custom models developed by the Ersilia team or contributors.
ontogpt
OntoGPT is a Python package for extracting structured information from text using large language models, instruction prompts, and ontology-based grounding. It provides a command line interface and a minimal web app for easy usage. The tool has been evaluated on test data and is used in related projects like TALISMAN for gene set analysis. OntoGPT enables users to extract information from text by specifying relevant terms and provides the extracted objects as output.
bia-bob
BIA `bob` is a Jupyter-based assistant for interacting with data using large language models to generate Python code. It can utilize OpenAI's chatGPT, Google's Gemini, Helmholtz' blablador, and Ollama. Users need respective accounts to access these services. Bob can assist in code generation, bug fixing, code documentation, GPU-acceleration, and offers a no-code custom Jupyter Kernel. It provides example notebooks for various tasks like bio-image analysis, model selection, and bug fixing. Installation is recommended via conda/mamba environment. Custom endpoints like blablador and ollama can be used. Google Cloud AI API integration is also supported. The tool is extensible for Python libraries to enhance Bob's functionality.
Scientific-LLM-Survey
Scientific Large Language Models (Sci-LLMs) is a repository that collects papers on scientific large language models, focusing on biology and chemistry domains. It includes textual, molecular, protein, and genomic languages, as well as multimodal language. The repository covers various large language models for tasks such as molecule property prediction, interaction prediction, protein sequence representation, protein sequence generation/design, DNA-protein interaction prediction, and RNA prediction. It also provides datasets and benchmarks for evaluating these models. The repository aims to facilitate research and development in the field of scientific language modeling.
polaris
Polaris establishes a novel, industry‑certified standard to foster the development of impactful methods in AI-based drug discovery. This library is a Python client to interact with the Polaris Hub. It allows you to download Polaris datasets and benchmarks, evaluate a custom method against a Polaris benchmark, and create and upload new datasets and benchmarks.
awesome-AI4MolConformation-MD
The 'awesome-AI4MolConformation-MD' repository focuses on protein conformations and molecular dynamics using generative artificial intelligence and deep learning. It provides resources, reviews, datasets, packages, and tools related to AI-driven molecular dynamics simulations. The repository covers a wide range of topics such as neural networks potentials, force fields, AI engines/frameworks, trajectory analysis, visualization tools, and various AI-based models for protein conformational sampling. It serves as a comprehensive guide for researchers and practitioners interested in leveraging AI for studying molecular structures and dynamics.