NoLabs

Open source biolab

Stars: 75

Visit

NoLabs is an open-source biolab that provides easy access to state-of-the-art models for bio research. It supports various tasks, including drug discovery, protein analysis, and small molecule design. NoLabs aims to accelerate bio research by making inference models accessible to everyone.

README:

NoLabs

Open source biolab

About
Features
Starting
Microservices
Technologies

About

NoLabs is an open source biolab that lets you run experiments with the latest state-of-the-art models for bio research.

The goal of the project is to accelerate bio research by making inference models easy to use for everyone. We are currently supporting protein biolab (predicting useful protein properties such as solubility, localisation, gene ontology, folding, etc.), drug discovery biolab (construct ligands and test binding to target proteins) and small molecules design biolab (design small molecules given a protein target and check drug-likeness and binding affinity).

We are working on expanding both and adding a cell biolab and genetic biolab, and we will appreciate your support and contributions.

Let's accelerate bio research!

Features

Bio Buddy - drug discovery co-pilot:

BioBuddy is a drug discovery copilot that supports:

Downloading data from ChemBL
Downloading data from RcsbPDB
Questions about drug discovery process, targets, chemical components etc
Writing review reports based on published papers

For example, you can ask

"Can you pull me some latest approved drugs?"
"Can you download me 1000 rhodopsins?"
"How does an aspirin molecule look like?" and it will do this and answer other questions.

To enable biobuddy run this command when starting nolabs:

$ ENABLE_BIOBUDDY=true docker compose up nolabs

And also start the biobuddy microservice:

$ OPENAI_API_KEY=your_openai_api_key TAVILY_API_KEY=your_tavily_api_key docker compose up biobuddy

Nolabs is running on GPT4 for the best performance. You can adjust the model you use in microservices/biobuddy/biobuddy/services.py

You can ignore OPENAI_API_KEY warnings when running other services using docker compose.

Drug discovery lab:

Drug-target interaction prediction, high throughput virtual screening (HTVS) based on:
- DiffDock
- uMol
Automatic pocket prediction via P2Rank
Automatic MSA generation via HH-suite3

Protein lab:

Prediction of subcellular localisation via fine-tuned ritakurban/ESM_protein_localization model (to be updated with a better model)
Prediction of folded structure via facebook/esmfold_v1
Gene ontology prediction for 200 most popular gene ontologies
Protein solubility prediction

Protein design Lab:

Protein generation via RFDiffusion

Conformations Lab:

Conformations via OpenMM and GROMACS

Small molecules design lab:

Small molecules design using a protein target with drug-likeness scoring component REINVENT4

Specify the search space (location) where designed molecule would bind relative to protein target. Then run reinforcement learning to generate new molecules in specified binding region.

WARNING: Reinforcement learning process might take a long time (with 128 molecules per 1 epoch and 50 epochs it could take a day)

Starting

# Clone this project
$ git clone https://github.com/BasedLabs/nolabs
$ cd nolabs

Generate a new token for docker registry https://github.com/settings/tokens/new Select 'read:packages'

$ docker login ghcr.io -u username -p ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

If you want to run a single feature (recommended)

$ docker compose up nolabs
$ docker compose up diffdock
$ docker compose up p2rank
...

OR if you want to run everything on one machine:

$ docker compose up

Server will be available on http://localhost:9000

⚠️ Warning: For macOS users, there are known issues with running Docker Compose properly for certain setups. As an alternative, please follow these recommended steps to run nolabs:

Create a Python Environment with Python 3.11
- First, ensure you have Python 3.11 installed. If not, you can download it from python.org or use a version manager like pyenv.
- Create a new virtual environment:
```
python3.11 -m venv nolabs-env
```
Activate the Virtual Environment and Install Poetry
- Activate the virtual environment:
```
source nolabs-env/bin/activate
```
- Install Poetry, a tool for dependency management and packaging in Python. You can install it with pip:
```
pip install poetry uvicorn
```
Install Dependencies Using Poetry
```
poetry install
```
Start a Uvicorn Server
- Set your environment variable and start the Uvicorn server with the following command:
```
NOLABS_ENVIRONMENT=dev poetry run uvicorn nolabs.api:app --host=127.0.0.1 --port=8000
```
- This command runs the nolabs API server on localhost at port 8000.
Set Up the Frontend
- In a separate terminal, ensure you have npm installed. If not, you can install Node.js and npm from nodejs.org.
- Run npm install to install the necessary Node.js packages:
```
npm install
```

After installing the packages, start the frontend development server:
```
npm run dev
```

Server will be available on http://localhost:9000

APIs

We provide individual Docker containers backed by FastAPI for each feature, which are available in the /microservices folder. You can use them individually as APIs.

For example, to run the esmfold service, you can use Docker Compose:

$ docker compose up esmfold

Once the service is up, you can make a POST request to perform a task, such as predicting a protein's folded structure. Here's a simple Python example:

import requests

# Define the API endpoint
url = 'http://127.0.0.1:5736/run-folding'

# Specify the protein sequence in the request body
data = {
    'protein_sequence': 'YOUR_PROTEIN_SEQUENCE_HERE'
}

# Make the POST request and get the response
response = requests.post(url, json=data)

# Extract the PDB content from the response
pdb_content = response.json().get('pdb_content', '')

print(pdb_content)

This Python script makes a POST request to the esmfold microservice with a protein sequence and prints the predicted PDB content.

Running services on a separate machine

Since we provide individual Docker containers backed by FastAPI for each feature, available in the /microservices folder, you can run them on separate machines. This setup is particularly useful if you're developing on a computer without GPU support but have access to a VM with a GPU for tasks like folding, docking, etc.

For instance, to run the diffdock service, use Docker Compose on the VM or computer equipped with a GPU.

On your server/VM/computer with a GPU, run:

$ docker compose up diffdock

Once the service is up, you can check that you can access it from your computer by navigating to http://< gpu_machine_ip>:5737/docs

If everything is correct, you should see the FastAPI page with diffdock's API surface like this:

Next, update the nolabs/infrastructure/settings.ini file on your primary machine to include the IP address of the service (replace 127.0.0.1 with your GPU machine's IP):

...
p2rank = http://127.0.0.1:5731
esmfold = http://127.0.0.1:5736
esmfold_light = http://127.0.0.1:5733
msa_light = http://127.0.0.1:5734
umol = http://127.0.0.1:5735
diffdock = http://127.0.0.1:5737 -> http://74.82.28.227:5737
...

And now you are ready to use this service hosted on a separate machine!

Supported microservices list

1) Protein design docker API

Model: RFdiffusion

RFdiffusion is an open source method for structure generation, with or without conditional information (a motif, target etc).

docker compose up protein_design

Swagger UI will be available on http://localhost:5789/docs

or install as a python package

2) ESMFold docker API

Model: ESMFold - Evolutionary Scale Modeling

docker compose up esmfold

Swagger UI will be available on http://localhost:5736/docs

or install as a python package

3) ESMAtlas docker API

Model: ESMAtlas

docker compose up esmfold_light

Swagger UI will be available on http://localhost:5733/docs

or install as a python package

4) Protein function prediction docker API

Model: Hugging Face

docker compose up gene_ontology

Swagger UI will be available on http://localhost:5788/docs

or install as a python package

5) Protein localisation prediction docker API

Model: Hugging Face

docker compose up localisation

Swagger UI will be available on http://localhost:5787/docs

or install as a python package

6) Protein binding site prediction docker API

Model: p2rank

docker compose up p2rank

Swagger UI will be available on http://localhost:5731/docs

or install as a python package

7) Protein solubility prediction docker API

Model: Hugging Face

docker compose up solubility

Swagger UI will be available on http://localhost:5786/docs

or Install as python package

8) Protein-ligand structure prediction docker API

Model: UMol

docker compose up umol

Swagger UI will be available on http://localhost:5735/docs

or Install as python package

9) RoseTTAFold docker API

Model: RoseTTAFold

docker compose up rosettafold

Swagger UI will be available on http://localhost:5738/docs

or Install as python package

WARNING: To use Rosettafold you must change the volumes '.' to point to the specified folders.

10) REINVENT4 Reinforcement Learning on a Protein receptor API

Model: REINVENT4

Misc: DockStream, QED, AutoDock Vina

docker compose up reinvent

Swagger UI will be available on http://localhost:5790/docs

or Install as python package

WARNING: Do not change the number of guvicorn workers (1), this will lead to microservice issues.

Technologies

The following tools were used in this project:

Requirements

[Recommended for laptops] If you are using a laptop, use --test argument (no need to have a lot of compute):

RAM > 16GB
[Optional] GPU memory >= 16GB (REALLY speeds up the inference)

[Recommended for powerful workstations] Else, if you want to host everything on your machine and have faster inference (also a requirement for folding sequences > 400 amino acids in length):

RAM > 30GB
[Optional] GPU memory >= 40GB (REALLY speeds up the inference)

Made by Igor and Tim

For Tasks:

Click tags to check more tools for each tasks

predict protein structure design small molecules analyze protein function predict protein-ligand interactions generate protein sequences

For Jobs:

bioinformatician drug discovery scientist protein scientist molecular biologist computational chemist

Alternative AI tools for NoLabs

Similar Open Source Tools

NoLabs

github

: 75

patchwork

PatchWork is an open-source framework designed for automating development tasks using large language models. It enables users to automate workflows such as PR reviews, bug fixing, security patching, and more through a self-hosted CLI agent and preferred LLMs. The framework consists of reusable atomic actions called Steps, customizable LLM prompts known as Prompt Templates, and LLM-assisted automations called Patchflows. Users can run Patchflows locally in their CLI/IDE or as part of CI/CD pipelines. PatchWork offers predefined patchflows like AutoFix, PRReview, GenerateREADME, DependencyUpgrade, and ResolveIssue, with the flexibility to create custom patchflows. Prompt templates are used to pass queries to LLMs and can be customized. Contributions to new patchflows, steps, and the core framework are encouraged, with chat assistants available to aid in the process. The roadmap includes expanding the patchflow library, introducing a debugger and validation module, supporting large-scale code embeddings, parallelization, fine-tuned models, and an open-source GUI. PatchWork is licensed under AGPL-3.0 terms, while custom patchflows and steps can be shared using the Apache-2.0 licensed patchwork template repository.

github

: 1.3k

gitingest

GitIngest is a tool that allows users to turn any Git repository into a prompt-friendly text ingest for LLMs. It provides easy code context by generating a text digest from a git repository URL or directory. The tool offers smart formatting for optimized output format for LLM prompts and provides statistics about file and directory structure, size of the extract, and token count. GitIngest can be used as a CLI tool on Linux and as a Python package for code integration. The tool is built using Tailwind CSS for frontend, FastAPI for backend framework, tiktoken for token estimation, and apianalytics.dev for simple analytics. Users can self-host GitIngest by building the Docker image and running the container. Contributions to the project are welcome, and the tool aims to be beginner-friendly for first-time contributors with a simple Python and HTML codebase.

github

: 7.9k

Easy-Translate

Easy-Translate is a script designed for translating large text files with a single command. It supports various models like M2M100, NLLB200, SeamlessM4T, LLaMA, and Bloom. The tool is beginner-friendly and offers seamless and customizable features for advanced users. It allows acceleration on CPU, multi-CPU, GPU, multi-GPU, and TPU, with support for different precisions and decoding strategies. Easy-Translate also provides an evaluation script for translations. Built on HuggingFace's Transformers and Accelerate library, it supports prompt usage and loading huge models efficiently.

github

: 177

open-parse

Open Parse is a Python library for visually discerning document layouts and chunking them effectively. It is designed to fill the gap in open-source libraries for handling complex documents. Unlike text splitting, which converts a file to raw text and slices it up, Open Parse visually analyzes documents for superior LLM input. It also supports basic markdown for parsing headings, bold, and italics, and has high-precision table support, extracting tables into clean Markdown formats with accuracy that surpasses traditional tools. Open Parse is extensible, allowing users to easily implement their own post-processing steps. It is also intuitive, with great editor support and completion everywhere, making it easy to use and learn.

github

: 2.4k

nodejs-todo-api-boilerplate

An LLM-powered code generation tool that relies on the built-in Node.js API Typescript Template Project to easily generate clean, well-structured CRUD module code from text description. It orchestrates 3 LLM micro-agents (`Developer`, `Troubleshooter` and `TestsFixer`) to generate code, fix compilation errors, and ensure passing E2E tests. The process includes module code generation, DB migration creation, seeding data, and running tests to validate output. By cycling through these steps, it guarantees consistent and production-ready CRUD code aligned with vertical slicing architecture.

github

: 127

air-light

Air-light is a minimalist WordPress starter theme designed to be an ultra minimal starting point for a WordPress project. It is built to be very straightforward, backwards compatible, front-end developer friendly and modular by its structure. Air-light is free of weird "app-like" folder structures or odd syntaxes that nobody else uses. It loves WordPress as it was and as it is.

github

: 1.0k

orama-core

OramaCore is a database designed for AI projects, answer engines, copilots, and search functionalities. It offers features such as a full-text search engine, vector database, LLM interface, and various utilities. The tool is currently under active development and not recommended for production use due to potential API changes. OramaCore aims to provide a comprehensive solution for managing data and enabling advanced AI capabilities in projects.

github

: 128

middleware

Middleware is an open-source engineering management tool that helps engineering leaders measure and analyze team effectiveness using DORA metrics. It integrates with CI/CD tools, automates DORA metric collection and analysis, visualizes key performance indicators, provides customizable reports and dashboards, and integrates with project management platforms. Users can set up Middleware using Docker or manually, generate encryption keys, set up backend and web servers, and access the application to view DORA metrics. The tool calculates DORA metrics using GitHub data, including Deployment Frequency, Lead Time for Changes, Mean Time to Restore, and Change Failure Rate. Middleware aims to provide DORA metrics to users based on their Git data, simplifying the process of tracking software delivery performance and operational efficiency.

github

: 1.1k

skynet

Skynet is an API server for AI services that wraps several apps and models. It consists of specialized modules that can be enabled or disabled as needed. Users can utilize Skynet for tasks such as summaries and action items with vllm or Ollama, live transcriptions with Faster Whisper via websockets, and RAG Assistant. The tool requires Poetry and Redis for operation. Skynet provides a quickstart guide for both Summaries/Assistant and Live Transcriptions, along with instructions for testing docker changes and running demos. Detailed documentation on configuration, running, building, and monitoring Skynet is available in the docs. Developers can contribute to Skynet by installing the pre-commit hook for linting. Skynet is distributed under the Apache 2.0 License.

github

: 54

just-chat

Just-Chat is a containerized application that allows users to easily set up and chat with their AI agent. Users can customize their AI assistant using a YAML file, add new capabilities with Python tools, and interact with the agent through a chat web interface. The tool supports various modern models like DeepSeek Reasoner, ChatGPT, LLAMA3.3, etc. Users can also use semantic search capabilities with MeiliSearch to find and reference relevant information based on meaning. Just-Chat requires Docker or Podman for operation and provides detailed installation instructions for both Linux and Windows users.

github

: 52

Upscaler

Holloway's Upscaler is a consolidation of various compiled open-source AI image/video upscaling products for a CLI-friendly image and video upscaling program. It provides low-cost AI upscaling software that can run locally on a laptop, programmable for albums and videos, reliable for large video files, and works without GUI overheads. The repository supports hardware testing on various systems and provides important notes on GPU compatibility, video types, and image decoding bugs. Dependencies include ffmpeg and ffprobe for video processing. The user manual covers installation, setup pathing, calling for help, upscaling images and videos, and contributing back to the project. Benchmarks are provided for performance evaluation on different hardware setups.

github

: 262

browser

Lightpanda Browser is an open-source headless browser designed for fast web automation, AI agents, LLM training, scraping, and testing. It features ultra-low memory footprint, exceptionally fast execution, and compatibility with Playwright and Puppeteer through CDP. Built for performance, Lightpanda offers Javascript execution, support for Web APIs, and is optimized for minimal memory usage. It is a modern solution for web scraping and automation tasks, providing a lightweight alternative to traditional browsers like Chrome.

github

: 7.8k

pear-landing-page

PearAI Landing Page is an open-source AI-powered code editor managed by Nang and Pan. It is built with Next.js, Vercel, Tailwind CSS, and TypeScript. The project requires setting up environment variables for proper configuration. Users can run the project locally by starting the development server and visiting the specified URL in the browser. Recommended extensions include Prettier, ESLint, and JavaScript and TypeScript Nightly. Contributions to the project are welcomed and appreciated.

github

: 105

Bjornulf_custom_nodes

github

: 87

ProX

ProX is a lm-based data refinement framework that automates the process of cleaning and improving data used in pre-training large language models. It offers better performance, domain flexibility, efficiency, and cost-effectiveness compared to traditional methods. The framework has been shown to improve model performance by over 2% and boost accuracy by up to 20% in tasks like math. ProX is designed to refine data at scale without the need for manual adjustments, making it a valuable tool for data preprocessing in natural language processing tasks.

github

: 164

For similar tasks

NoLabs

github

: 75

For similar jobs

NoLabs

github

: 75

OpenCRISPR

OpenCRISPR is a set of free and open gene editing systems designed by Profluent Bio. The OpenCRISPR-1 protein maintains the prototypical architecture of a Type II Cas9 nuclease but is hundreds of mutations away from SpCas9 or any other known natural CRISPR-associated protein. You can view OpenCRISPR-1 as a drop-in replacement for many protocols that need a cas9-like protein with an NGG PAM and you can even use it with canonical SpCas9 gRNAs. OpenCRISPR-1 can be fused in a deactivated or nickase format for next generation gene editing techniques like base, prime, or epigenome editing.

github

: 253

ersilia

The Ersilia Model Hub is a unified platform of pre-trained AI/ML models dedicated to infectious and neglected disease research. It offers an open-source, low-code solution that provides seamless access to AI/ML models for drug discovery. Models housed in the hub come from two sources: published models from literature (with due third-party acknowledgment) and custom models developed by the Ersilia team or contributors.

github

: 249

ontogpt

OntoGPT is a Python package for extracting structured information from text using large language models, instruction prompts, and ontology-based grounding. It provides a command line interface and a minimal web app for easy usage. The tool has been evaluated on test data and is used in related projects like TALISMAN for gene set analysis. OntoGPT enables users to extract information from text by specifying relevant terms and provides the extracted objects as output.

github

: 584

bia-bob

BIA `bob` is a Jupyter-based assistant for interacting with data using large language models to generate Python code. It can utilize OpenAI's chatGPT, Google's Gemini, Helmholtz' blablador, and Ollama. Users need respective accounts to access these services. Bob can assist in code generation, bug fixing, code documentation, GPU-acceleration, and offers a no-code custom Jupyter Kernel. It provides example notebooks for various tasks like bio-image analysis, model selection, and bug fixing. Installation is recommended via conda/mamba environment. Custom endpoints like blablador and ollama can be used. Google Cloud AI API integration is also supported. The tool is extensible for Python libraries to enhance Bob's functionality.

github

: 110

Scientific-LLM-Survey

Scientific Large Language Models (Sci-LLMs) is a repository that collects papers on scientific large language models, focusing on biology and chemistry domains. It includes textual, molecular, protein, and genomic languages, as well as multimodal language. The repository covers various large language models for tasks such as molecule property prediction, interaction prediction, protein sequence representation, protein sequence generation/design, DNA-protein interaction prediction, and RNA prediction. It also provides datasets and benchmarks for evaluating these models. The repository aims to facilitate research and development in the field of scientific language modeling.

github

: 261

polaris

Polaris establishes a novel, industry‑certified standard to foster the development of impactful methods in AI-based drug discovery. This library is a Python client to interact with the Polaris Hub. It allows you to download Polaris datasets and benchmarks, evaluate a custom method against a Polaris benchmark, and create and upload new datasets and benchmarks.

github

: 111

awesome-AI4MolConformation-MD

The 'awesome-AI4MolConformation-MD' repository focuses on protein conformations and molecular dynamics using generative artificial intelligence and deep learning. It provides resources, reviews, datasets, packages, and tools related to AI-driven molecular dynamics simulations. The repository covers a wide range of topics such as neural networks potentials, force fields, AI engines/frameworks, trajectory analysis, visualization tools, and various AI-based models for protein conformational sampling. It serves as a comprehensive guide for researchers and practitioners interested in leveraging AI for studying molecular structures and dynamics.

github

: 165

NoLabs

README:

NoLabs

Open source biolab

Contents

About

Features

Starting

APIs

Running services on a separate machine

Supported microservices list

1) Protein design docker API

2) ESMFold docker API

3) ESMAtlas docker API

4) Protein function prediction docker API

5) Protein localisation prediction docker API

6) Protein binding site prediction docker API

7) Protein solubility prediction docker API

8) Protein-ligand structure prediction docker API

9) RoseTTAFold docker API

10) REINVENT4 Reinforcement Learning on a Protein receptor API

Technologies

Requirements

For Tasks:

For Jobs:

Alternative AI tools for NoLabs

Similar Open Source Tools

NoLabs

patchwork

gitingest

Easy-Translate

open-parse

nodejs-todo-api-boilerplate

air-light

orama-core

middleware

skynet

just-chat

Upscaler

browser

pear-landing-page

Bjornulf_custom_nodes

ProX

For similar tasks

NoLabs

For similar jobs

NoLabs

OpenCRISPR

ersilia

ontogpt

bia-bob

Scientific-LLM-Survey

polaris

awesome-AI4MolConformation-MD