qlib

Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.

Stars: 18001

Visit

Qlib is an open-source, AI-oriented quantitative investment platform that supports diverse machine learning modeling paradigms, including supervised learning, market dynamics modeling, and reinforcement learning. It covers the entire chain of quantitative investment, from alpha seeking to order execution. The platform empowers researchers to explore ideas and implement productions using AI technologies in quantitative investment. Qlib collaboratively solves key challenges in quantitative investment by releasing state-of-the-art research works in various paradigms. It provides a full ML pipeline for data processing, model training, and back-testing, enabling users to perform tasks such as forecasting market patterns, adapting to market dynamics, and modeling continuous investment decisions.

README:

📰 What's NEW! 💖

Recent released features

Introducing : LLM-Based Autonomous Evolving Agents for Industrial Data-Driven R&D

We are excited to announce the release of RD-Agent📢, a powerful tool that supports automated factor mining and model optimization in quant investment R&D.

RD-Agent is now available on GitHub, and we welcome your star🌟!

To learn more, please visit our ♾️Demo page. Here, you will find demo videos in both English and Chinese to help you better understand the scenario and usage of RD-Agent.

We have prepared several demo videos for you:

Scenario	Demo video (English)	Demo video (中文)
Quant Factor Mining	Link	Link
Quant Factor Mining from reports	Link	Link
Quant Model Optimization	Link	Link

Feature	Status
BPQP for End-to-end learning	📈Coming soon!(Under review)
🔥LLM-driven Auto Quant Factory🔥	🚀 Released in ♾️RD-Agent on Aug 8, 2024
KRNN and Sandwich models	📈 Released on May 26, 2023
Release Qlib v0.9.0	Released on Dec 9, 2022
RL Learning Framework	🔨 📈 Released on Nov 10, 2022. #1332, #1322, #1316,#1299,#1263, #1244, #1169, #1125, #1076
HIST and IGMTF models	📈 Released on Apr 10, 2022
Qlib notebook tutorial	📖 Released on Apr 7, 2022
Ibovespa index data	🍚 Released on Apr 6, 2022
Point-in-Time database	🔨 Released on Mar 10, 2022
Arctic Provider Backend & Orderbook data example	🔨 Released on Jan 17, 2022
Meta-Learning-based framework & DDG-DA	📈 🔨 Released on Jan 10, 2022
Planning-based portfolio optimization	🔨 Released on Dec 28, 2021
Release Qlib v0.8.0	Released on Dec 8, 2021
ADD model	📈 Released on Nov 22, 2021
ADARNN model	📈 Released on Nov 14, 2021
TCN model	📈 Released on Nov 4, 2021
Nested Decision Framework	🔨 Released on Oct 1, 2021. Example and Doc
Temporal Routing Adaptor (TRA)	📈 Released on July 30, 2021
Transformer & Localformer	📈 Released on July 22, 2021
Release Qlib v0.7.0	Released on July 12, 2021
TCTS Model	📈 Released on July 1, 2021
Online serving and automatic model rolling	🔨 Released on May 17, 2021
DoubleEnsemble Model	📈 Released on Mar 2, 2021
High-frequency data processing example	🔨 Released on Feb 5, 2021
High-frequency trading example	📈 Part of code released on Jan 28, 2021
High-frequency data(1min)	🍚 Released on Jan 27, 2021
Tabnet Model	📈 Released on Jan 22, 2021

Features released before 2021 are not listed here.

Qlib is an open-source, AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms, including supervised learning, market dynamics modeling, and reinforcement learning.

An increasing number of SOTA Quant research works/papers in diverse paradigms are being released in Qlib to collaboratively solve key challenges in quantitative investment. For example, 1) using supervised learning to mine the market's complex non-linear patterns from rich and heterogeneous financial data, 2) modeling the dynamic nature of the financial market using adaptive concept drift technology, and 3) using reinforcement learning to model continuous investment decisions and assist investors in optimizing their trading strategies.

It contains the full ML pipeline of data processing, model training, back-testing; and covers the entire chain of quantitative investment: alpha seeking, risk modeling, portfolio optimization, and order execution. For more details, please refer to our paper "Qlib: An AI-oriented Quantitative Investment Platform".

Frameworks, Tutorial, Data & DevOps	Main Challenges & Solutions in Quant Research
Plans Framework of Qlib Quick Start Installation Data Preparation Auto Quant Research Workflow Building Customized Quant Research Workflow by Code Quant Dataset Zoo Learning Framework More About Qlib Offline Mode and Online Mode Performance of Qlib Data Server Related Reports Contact Us Contributing	Main Challenges & Solutions in Quant Research Forecasting: Finding Valuable Signals/Patterns Quant Model (Paper) Zoo Run a Single Model Run Multiple Models Adapting to Market Dynamics Reinforcement Learning: modeling continuous decisions

Plans

New features under development(order by estimated release time). Your feedbacks about the features are very important.

Framework of Qlib

The high-level framework of Qlib can be found above(users can find the detailed framework of Qlib's design when getting into nitty gritty). The components are designed as loose-coupled modules, and each component could be used stand-alone.

Qlib provides a strong infrastructure to support Quant research. Data is always an important part. A strong learning framework is designed to support diverse learning paradigms (e.g. reinforcement learning, supervised learning) and patterns at different levels(e.g. market dynamic modeling). By modeling the market, trading strategies will generate trade decisions that will be executed. Multiple trading strategies and executors in different levels or granularities can be nested to be optimized and run together. At last, a comprehensive analysis will be provided and the model can be served online in a low cost.

Quick Start

This quick start guide tries to demonstrate

It's very easy to build a complete Quant research workflow and try your ideas with Qlib.
Though with public data and simple models, machine learning technologies work very well in practical Quant investment.

Here is a quick demo shows how to install Qlib, and run LightGBM with qrun. But, please make sure you have already prepared the data following the instruction.

Installation

This table demonstrates the supported Python version of Qlib:

	install with pip	install from source	plot
Python 3.8	✔️	✔️	✔️
Python 3.9	✔️	✔️	✔️
Python 3.10	✔️	✔️	✔️
Python 3.11	✔️	✔️	✔️
Python 3.12	✔️	✔️	✔️

Note:

Conda is suggested for managing your Python environment. In some cases, using Python outside of a conda environment may result in missing header files, causing the installation failure of certain packages.
Please pay attention that installing cython in Python 3.6 will raise some error when installing Qlib from source. If users use Python 3.6 on their machines, it is recommended to upgrade Python to version 3.8 or higher, or use conda's Python to install Qlib from source.

Install with pip

Users can easily install Qlib by pip according to the following command.

  pip install pyqlib

Note: pip will install the latest stable qlib. However, the main branch of qlib is in active development. If you want to test the latest scripts or functions in the main branch. Please install qlib with the methods below.

Install from source

Also, users can install the latest dev version Qlib by the source code according to the following steps:

Before installing Qlib from source, users need to install some dependencies:
```
pip install numpy
pip install --upgrade cython
```

Clone the repository and install Qlib as follows.

git clone https://github.com/microsoft/qlib.git && cd qlib
pip install .  # `pip install -e .[dev]` is recommended for development. check details in docs/developer/code_standard_and_dev_guide.rst

Tips: If you fail to install Qlib or run the examples in your environment, comparing your steps and the CI workflow may help you find the problem.

Tips for Mac: If you are using Mac with M1, you might encounter issues in building the wheel for LightGBM, which is due to missing dependencies from OpenMP. To solve the problem, install openmp first with brew install libomp and then run pip install . to build it successfully.

Data Preparation

❗ Due to more restrict data security policy. The offical dataset is disabled temporarily. You can try this data source contributed by the community. Here is an example to download the latest data.

wget https://github.com/chenditc/investment_data/releases/latest/download/qlib_bin.tar.gz
mkdir -p ~/.qlib/qlib_data/cn_data
tar -zxvf qlib_bin.tar.gz -C ~/.qlib/qlib_data/cn_data --strip-components=2
rm -f qlib_bin.tar.gz

The official dataset below will resume in short future.

Load and prepare data by running the following code:

Get with module

# get 1d data
python -m qlib.run.get_data qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn

# get 1min data
python -m qlib.run.get_data qlib_data --target_dir ~/.qlib/qlib_data/cn_data_1min --region cn --interval 1min

Get from source

# get 1d data
python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn

# get 1min data
python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data_1min --region cn --interval 1min

This dataset is created by public data collected by crawler scripts, which have been released in the same repository. Users could create the same dataset with it. Description of dataset

Please pay ATTENTION that the data is collected from Yahoo Finance, and the data might not be perfect. We recommend users to prepare their own data if they have a high-quality dataset. For more information, users can refer to the related document.

Automatic update of daily frequency data (from yahoo finance)

This step is Optional if users only want to try their models and strategies on history data.

It is recommended that users update the data manually once (--trading_date 2021-05-25) and then set it to update automatically.

NOTE: Users can't incrementally update data based on the offline data provided by Qlib(some fields are removed to reduce the data size). Users should use yahoo collector to download Yahoo data from scratch and then incrementally update it.

For more information, please refer to: yahoo collector

Automatic update of data to the "qlib" directory each trading day(Linux)
- use crontab: crontab -e
- set up timed tasks:
```
* * * * 1-5 python <script path> update_data_to_bin --qlib_data_1d_dir <user data dir>
```
  - script path: scripts/data_collector/yahoo/collector.py

Manual update of data

python scripts/data_collector/yahoo/collector.py update_data_to_bin --qlib_data_1d_dir <user data dir> --trading_date <start date> --end_date <end date>

trading_date: start of trading day
end_date: end of trading day(not included)

Checking the health of the data

We provide a script to check the health of the data, you can run the following commands to check whether the data is healthy or not.
```
python scripts/check_data_health.py check_data --qlib_dir ~/.qlib/qlib_data/cn_data
```

Of course, you can also add some parameters to adjust the test results, such as this.

python scripts/check_data_health.py check_data --qlib_dir ~/.qlib/qlib_data/cn_data --missing_data_num 30055 --large_step_threshold_volume 94485 --large_step_threshold_price 20

If you want more information about check_data_health, please refer to the documentation.

Docker images

Pulling a docker image from a docker hub repository
```
docker pull pyqlib/qlib_image_stable:stable
```

Start a new Docker container

docker run -it --name <container name> -v <Mounted local directory>:/app qlib_image_stable

At this point you are in the docker environment and can run the qlib scripts. An example:

>>> python scripts/get_data.py qlib_data --name qlib_data_simple --target_dir ~/.qlib/qlib_data/cn_data --interval 1d --region cn
>>> python qlib/workflow/cli.py examples/benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml

Exit the container
```
>>> exit
```
Restart the container
```
docker start -i -a <container name>
```
Stop the container
```
docker stop <container name>
```
Delete the container
```
docker rm <container name>
```
If you want to know more information, please refer to the documentation.

Auto Quant Research Workflow

Qlib provides a tool named qrun to run the whole workflow automatically (including building dataset, training models, backtest and evaluation). You can start an auto quant research workflow and have a graphical reports analysis according to the following steps:

Quant Research Workflow: Run qrun with lightgbm workflow config (workflow_config_lightgbm_Alpha158.yaml as following.

  cd examples  # Avoid running program under the directory contains `qlib`
  qrun benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml

If users want to use qrun under debug mode, please use the following command:

python -m pdb qlib/workflow/cli.py examples/benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml

The result of qrun is as follows, please refer to docs for more explanations about the result.

'The following are analysis results of the excess return without cost.'
                       risk
mean               0.000708
std                0.005626
annualized_return  0.178316
information_ratio  1.996555
max_drawdown      -0.081806
'The following are analysis results of the excess return with cost.'
                       risk
mean               0.000512
std                0.005626
annualized_return  0.128982
information_ratio  1.444287
max_drawdown      -0.091078

Here are detailed documents for qrun and workflow.

Graphical Reports Analysis: First, run python -m pip install .[analysis] to install the required dependencies. Then run examples/workflow_by_code.ipynb with jupyter notebook to get graphical reports.
- Forecasting signal (model prediction) analysis
  - Cumulative Return of groups
  - Return distribution
  - Information Coefficient (IC)
  - Auto Correlation of forecasting signal (model prediction)
- Portfolio analysis
  - Backtest return
- Explanation of above results

Building Customized Quant Research Workflow by Code

The automatic workflow may not suit the research workflow of all Quant researchers. To support a flexible Quant research workflow, Qlib also provides a modularized interface to allow researchers to build their own workflow by code. Here is a demo for customized Quant research workflow by code.

Main Challenges & Solutions in Quant Research

Quant investment is a very unique scenario with lots of key challenges to be solved. Currently, Qlib provides some solutions for several of them.

Forecasting: Finding Valuable Signals/Patterns

Accurate forecasting of the stock price trend is a very important part to construct profitable portfolios. However, huge amount of data with various formats in the financial market which make it challenging to build forecasting models.

An increasing number of SOTA Quant research works/papers, which focus on building forecasting models to mine valuable signals/patterns in complex financial data, are released in Qlib

Quant Model (Paper) Zoo

Here is a list of models built on Qlib.

Your PR of new Quant models is highly welcomed.

The performance of each model on the Alpha158 and Alpha360 datasets can be found here.

Run a single model

All the models listed above are runnable with Qlib. Users can find the config files we provide and some details about the model through the benchmarks folder. More information can be retrieved at the model files listed above.

Qlib provides three different ways to run a single model, users can pick the one that fits their cases best:

Users can use the tool qrun mentioned above to run a model's workflow based from a config file.
Users can create a workflow_by_code python script based on the one listed in the examples folder.
Users can use the script run_all_model.py listed in the examples folder to run a model. Here is an example of the specific shell command to be used: python run_all_model.py run --models=lightgbm, where the --models arguments can take any number of models listed above(the available models can be found in benchmarks). For more use cases, please refer to the file's docstrings.
- NOTE: Each baseline has different environment dependencies, please make sure that your python version aligns with the requirements(e.g. TFT only supports Python 3.6~3.7 due to the limitation of tensorflow==1.15.0)

Run multiple models

Qlib also provides a script run_all_model.py which can run multiple models for several iterations. (Note: the script only support Linux for now. Other OS will be supported in the future. Besides, it doesn't support parallel running the same model for multiple times as well, and this will be fixed in the future development too.)

The script will create a unique virtual environment for each model, and delete the environments after training. Thus, only experiment results such as IC and backtest results will be generated and stored.

Here is an example of running all the models for 10 iterations:

python run_all_model.py run 10

It also provides the API to run specific models at once. For more use cases, please refer to the file's docstrings.

Adapting to Market Dynamics

Due to the non-stationary nature of the environment of the financial market, the data distribution may change in different periods, which makes the performance of models build on training data decays in the future test data. So adapting the forecasting models/strategies to market dynamics is very important to the model/strategies' performance.

Here is a list of solutions built on Qlib.

Reinforcement Learning: modeling continuous decisions

Qlib now supports reinforcement learning, a feature designed to model continuous investment decisions. This functionality assists investors in optimizing their trading strategies by learning from interactions with the environment to maximize some notion of cumulative reward.

Here is a list of solutions built on Qlib categorized by scenarios.

RL for order execution

Here is the introduction of this scenario. All the methods below are compared here.

Quant Dataset Zoo

Dataset plays a very important role in Quant. Here is a list of the datasets built on Qlib:

Dataset	US Market	China Market
Alpha360	√	√
Alpha158	√	√

Here is a tutorial to build dataset with Qlib. Your PR to build new Quant dataset is highly welcomed.

Learning Framework

Qlib is high customizable and a lot of its components are learnable. The learnable components are instances of Forecast Model and Trading Agent. They are learned based on the Learning Framework layer and then applied to multiple scenarios in Workflow layer. The learning framework leverages the Workflow layer as well(e.g. sharing Information Extractor, creating environments based on Execution Env).

Based on learning paradigms, they can be categorized into reinforcement learning and supervised learning.

For supervised learning, the detailed docs can be found here.
For reinforcement learning, the detailed docs can be found here. Qlib's RL learning framework leverages Execution Env in Workflow layer to create environments. It's worth noting that NestedExecutor is supported as well. This empowers users to optimize different level of strategies/models/agents together (e.g. optimizing an order execution strategy for a specific portfolio management strategy).

More About Qlib

If you want to have a quick glance at the most frequently used components of qlib, you can try notebooks here.

The detailed documents are organized in docs. Sphinx and the readthedocs theme is required to build the documentation in html formats.

cd docs/
conda install sphinx sphinx_rtd_theme -y
# Otherwise, you can install them with pip
# pip install sphinx sphinx_rtd_theme
make html

You can also view the latest document online directly.

Qlib is in active and continuing development. Our plan is in the roadmap, which is managed as a github project.

Offline Mode and Online Mode

The data server of Qlib can either deployed as Offline mode or Online mode. The default mode is offline mode.

Under Offline mode, the data will be deployed locally.

Under Online mode, the data will be deployed as a shared data service. The data and their cache will be shared by all the clients. The data retrieval performance is expected to be improved due to a higher rate of cache hits. It will consume less disk space, too. The documents of the online mode can be found in Qlib-Server. The online mode can be deployed automatically with Azure CLI based scripts. The source code of online data server can be found in Qlib-Server repository.

Performance of Qlib Data Server

The performance of data processing is important to data-driven methods like AI technologies. As an AI-oriented platform, Qlib provides a solution for data storage and data processing. To demonstrate the performance of Qlib data server, we compare it with several other data storage solutions.

We evaluate the performance of several storage solutions by finishing the same task, which creates a dataset (14 features/factors) from the basic OHLCV daily data of a stock market (800 stocks each day from 2007 to 2020). The task involves data queries and processing.

	HDF5	MySQL	MongoDB	InfluxDB	Qlib -E -D	Qlib +E -D	Qlib +E +D
Total (1CPU) (seconds)	184.4±3.7	365.3±7.5	253.6±6.7	368.2±3.6	147.0±8.8	47.6±1.0	7.4±0.3
Total (64CPU) (seconds)					8.8±0.6	4.2±0.2

+(-)E indicates with (out) ExpressionCache
+(-)D indicates with (out) DatasetCache

Most general-purpose databases take too much time to load data. After looking into the underlying implementation, we find that data go through too many layers of interfaces and unnecessary format transformations in general-purpose database solutions. Such overheads greatly slow down the data loading process. Qlib data are stored in a compact format, which is efficient to be combined into arrays for scientific computation.

Related Reports

Contact Us

If you have any issues, please create issue here or send messages in gitter.
If you want to make contributions to Qlib, please create pull requests.
For other reasons, you are welcome to contact us by email([email protected]).
- We are recruiting new members(both FTEs and interns), your resumes are welcome!

Join IM discussion groups:

Gitter

Contributing

We appreciate all contributions and thank all the contributors!

Before we released Qlib as an open-source project on Github in Sep 2020, Qlib is an internal project in our group. Unfortunately, the internal commit history is not kept. A lot of members in our group have also contributed a lot to Qlib, which includes Ruihua Wang, Yinda Zhang, Haisu Yu, Shuyu Wang, Bochen Pang, and Dong Zhou. Especially thanks to Dong Zhou due to his initial version of Qlib.

Guidance

This project welcomes contributions and suggestions.
Here are some code standards and development guidance for submiting a pull request.

Making contributions is not a hard thing. Solving an issue(maybe just answering a question raised in issues list or gitter), fixing/issuing a bug, improving the documents and even fixing a typo are important contributions to Qlib.

For example, if you want to contribute to Qlib's document/code, you can follow the steps in the figure below.

If you don't know how to start to contribute, you can refer to the following examples.

Type	Examples
Solving issues	Answer a question; issuing or fixing a bug
Docs	Improve docs quality ; Fix a typo
Feature	Implement a requested feature like this; Refactor interfaces
Dataset	Add a dataset
Models	Implement a new model, some instructions to contribute models

Good first issues are labelled to indicate that they are easy to start your contributions.

You can find some impefect implementation in Qlib by rg 'TODO|FIXME' qlib

If you would like to become one of Qlib's maintainers to contribute more (e.g. help merge PR, triage issues), please contact us by email([email protected]). We are glad to help to upgrade your permission.

Licence

Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the right to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

For Tasks:

Click tags to check more tools for each tasks

explore market patterns adapt to market changes optimize trading strategies back-test models forecast stock trends

For Jobs:

quantitative analyst data scientist financial engineer algorithmic trader investment researcher

Alternative AI tools for qlib

Similar Open Source Tools

qlib

github

: 18.0k

AgentLab

AgentLab is an open, easy-to-use, and extensible framework designed to accelerate web agent research. It provides features for developing and evaluating agents on various benchmarks supported by BrowserGym. The framework allows for large-scale parallel agent experiments using ray, building blocks for creating agents over BrowserGym, and a unified LLM API for OpenRouter, OpenAI, Azure, or self-hosted using TGI. AgentLab also offers reproducibility features, a unified LeaderBoard, and supports multiple benchmarks like WebArena, WorkArena, WebLinx, VisualWebArena, AssistantBench, GAIA, Mind2Web-live, and MiniWoB.

github

: 239

weblinx

WebLINX is a Python library and dataset for real-world website navigation with multi-turn dialogue. The repository provides code for training models reported in the WebLINX paper, along with a comprehensive API to work with the dataset. It includes modules for data processing, model evaluation, and utility functions. The modeling directory contains code for processing, training, and evaluating models such as DMR, LLaMA, MindAct, Pix2Act, and Flan-T5. Users can install specific dependencies for HTML processing, video processing, model evaluation, and library development. The evaluation module provides metrics and functions for evaluating models, with ongoing work to improve documentation and functionality.

github

: 112

ragas

Ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. RAG denotes a class of LLM applications that use external data to augment the LLM’s context. There are existing tools and frameworks that help you build these pipelines but evaluating it and quantifying your pipeline performance can be hard. This is where Ragas (RAG Assessment) comes in. Ragas provides you with the tools based on the latest research for evaluating LLM-generated text to give you insights about your RAG pipeline. Ragas can be integrated with your CI/CD to provide continuous checks to ensure performance.

github

: 8.7k

AgentBench

AgentBench is a benchmark designed to evaluate Large Language Models (LLMs) as autonomous agents in various environments. It includes 8 distinct environments such as Operating System, Database, Knowledge Graph, Digital Card Game, and Lateral Thinking Puzzles. The tool provides a comprehensive evaluation of LLMs' ability to operate as agents by offering Dev and Test sets for each environment. Users can quickly start using the tool by following the provided steps, configuring the agent, starting task servers, and assigning tasks. AgentBench aims to bridge the gap between LLMs' proficiency as agents and their practical usability.

github

: 2.1k

maxtext

MaxText is a high performance, highly scalable, open-source Large Language Model (LLM) written in pure Python/Jax targeting Google Cloud TPUs and GPUs for training and inference. It aims to be a launching off point for ambitious LLM projects in research and production, supporting TPUs and GPUs, models like Llama2, Mistral, and Gemma. MaxText provides specific instructions for getting started, runtime performance results, comparison to alternatives, and features like stack trace collection, ahead of time compilation for TPUs and GPUs, and automatic upload of logs to Vertex Tensorboard.

github

: 1.7k

katib

Katib is a Kubernetes-native project for automated machine learning (AutoML). Katib supports Hyperparameter Tuning, Early Stopping and Neural Architecture Search. Katib is the project which is agnostic to machine learning (ML) frameworks. It can tune hyperparameters of applications written in any language of the users’ choice and natively supports many ML frameworks, such as TensorFlow, Apache MXNet, PyTorch, XGBoost, and others. Katib can perform training jobs using any Kubernetes Custom Resources with out of the box support for Kubeflow Training Operator, Argo Workflows, Tekton Pipelines and many more.

github

: 1.5k

training-operator

Kubeflow Training Operator is a Kubernetes-native project for fine-tuning and scalable distributed training of machine learning (ML) models created with various ML frameworks such as PyTorch, Tensorflow, XGBoost, MPI, Paddle and others. Training Operator allows you to use Kubernetes workloads to effectively train your large models via Kubernetes Custom Resources APIs or using Training Operator Python SDK. > Note: Before v1.2 release, Kubeflow Training Operator only supports TFJob on Kubernetes. * For a complete reference of the custom resource definitions, please refer to the API Definition. * TensorFlow API Definition * PyTorch API Definition * Apache MXNet API Definition * XGBoost API Definition * MPI API Definition * PaddlePaddle API Definition * For details of all-in-one operator design, please refer to the All-in-one Kubeflow Training Operator * For details on its observability, please refer to the monitoring design doc.

github

: 1.7k

langkit

LangKit is an open-source text metrics toolkit for monitoring language models. It offers methods for extracting signals from input/output text, compatible with whylogs. Features include text quality, relevance, security, sentiment, toxicity analysis. Installation via PyPI. Modules contain UDFs for whylogs. Benchmarks show throughput on AWS instances. FAQs available.

github

: 823

pgai

pgai simplifies the process of building search and Retrieval Augmented Generation (RAG) AI applications with PostgreSQL. It brings embedding and generation AI models closer to the database, allowing users to create embeddings, retrieve LLM chat completions, reason over data for classification, summarization, and data enrichment directly from within PostgreSQL in a SQL query. The tool requires an OpenAI API key and a PostgreSQL client to enable AI functionality in the database. Users can install pgai from source, run it in a pre-built Docker container, or enable it in a Timescale Cloud service. The tool provides functions to handle API keys using psql or Python, and offers various AI functionalities like tokenizing, detokenizing, embedding, chat completion, and content moderation.

github

: 4.6k

open-assistant-api

Open Assistant API is an open-source, self-hosted AI intelligent assistant API compatible with the official OpenAI interface. It supports integration with more commercial and private models, R2R RAG engine, internet search, custom functions, built-in tools, code interpreter, multimodal support, LLM support, and message streaming output. Users can deploy the service locally and expand existing features. The API provides user isolation based on tokens for SaaS deployment requirements and allows integration of various tools to enhance its capability to connect with the external world.

github

: 269

all-rag-techniques

This repository provides a hands-on approach to Retrieval-Augmented Generation (RAG) techniques, simplifying advanced concepts into understandable implementations using Python libraries like openai, numpy, and matplotlib. It offers a collection of Jupyter Notebooks with concise explanations, step-by-step implementations, code examples, evaluations, and visualizations for various RAG techniques. The goal is to make RAG more accessible and demystify its workings for educational purposes.

github

: 504

katrain

KaTrain is a tool designed for analyzing games and playing go with AI feedback from KataGo. Users can review their games to find costly moves, play against AI with immediate feedback, play against weakened AI versions, and generate focused SGF reviews. The tool provides various features such as previews, tutorials, installation instructions, and configuration options for KataGo. Users can play against AI, receive instant feedback on moves, explore variations, and request in-depth analysis. KaTrain also supports distributed training for contributing to KataGo's strength and training bigger models. The tool offers themes customization, FAQ section, and opportunities for support and contribution through GitHub issues and Discord community.

github

: 1.6k

maxtext

MaxText is a high-performance, highly scalable, open-source LLM written in pure Python/Jax and targeting Google Cloud TPUs and GPUs for training and inference. MaxText achieves high MFUs and scales from single host to very large clusters while staying simple and "optimization-free" thanks to the power of Jax and the XLA compiler. MaxText aims to be a launching off point for ambitious LLM projects both in research and production. We encourage users to start by experimenting with MaxText out of the box and then fork and modify MaxText to meet their needs.

github

: 1.5k

LLM-Pruner

LLM-Pruner is a tool for structural pruning of large language models, allowing task-agnostic compression while retaining multi-task solving ability. It supports automatic structural pruning of various LLMs with minimal human effort. The tool is efficient, requiring only 3 minutes for pruning and 3 hours for post-training. Supported LLMs include Llama-3.1, Llama-3, Llama-2, LLaMA, BLOOM, Vicuna, and Baichuan. Updates include support for new LLMs like GQA and BLOOM, as well as fine-tuning results achieving high accuracy. The tool provides step-by-step instructions for pruning, post-training, and evaluation, along with a Gradio interface for text generation. Limitations include issues with generating repetitive or nonsensical tokens in compressed models and manual operations for certain models.

github

: 828

exllamav2

ExLlamaV2 is an inference library designed for running local LLMs on modern consumer GPUs. The library supports paged attention via Flash Attention 2.5.7+, offers a new dynamic generator with features like dynamic batching, smart prompt caching, and K/V cache deduplication. It also provides an API for local or remote inference using TabbyAPI, with extended features like HF model downloading and support for HF Jinja2 chat templates. ExLlamaV2 aims to optimize performance and speed across different GPU models, with potential future optimizations and variations in speeds. The tool can be integrated with TabbyAPI for OpenAI-style web API compatibility and supports a standalone web UI called ExUI for single-user interaction with chat and notebook modes. ExLlamaV2 also offers support for text-generation-webui and lollms-webui through specific loaders and bindings.

github

: 4.0k

For similar tasks

qlib

github

: 18.0k

FinMem-LLM-StockTrading

This repository contains the Python source code for FINMEM, a Performance-Enhanced Large Language Model Trading Agent with Layered Memory and Character Design. It introduces FinMem, a novel LLM-based agent framework devised for financial decision-making, encompassing three core modules: Profiling, Memory with layered processing, and Decision-making. FinMem's memory module aligns closely with the cognitive structure of human traders, offering robust interpretability and real-time tuning. The framework enables the agent to self-evolve its professional knowledge, react agilely to new investment cues, and continuously refine trading decisions in the volatile financial environment. It presents a cutting-edge LLM agent framework for automated trading, boosting cumulative investment returns.

github

: 220

solana-trading-bot

Solana AI Trade Bot is an advanced trading tool specifically designed for meme token trading on the Solana blockchain. It leverages AI technology powered by GPT-4.0 to automate trades, identify low-risk/high-potential tokens, and assist in token creation and management. The bot offers cross-platform compatibility and a range of configurable settings for buying, selling, and filtering tokens. Users can benefit from real-time AI support and enhance their trading experience with features like automatic selling, slippage management, and profit/loss calculations. To optimize performance, it is recommended to connect the bot to a private light node for efficient trading execution.

github

: 53

For similar jobs

qlib

github

: 18.0k

jupyter-quant

Jupyter Quant is a dockerized environment tailored for quantitative research, equipped with essential tools like statsmodels, pymc, arch, py_vollib, zipline-reloaded, PyPortfolioOpt, numpy, pandas, sci-py, scikit-learn, yellowbricks, shap, optuna, ib_insync, Cython, Numba, bottleneck, numexpr, jedi language server, jupyterlab-lsp, black, isort, and more. It does not include conda/mamba and relies on pip for package installation. The image is optimized for size, includes common command line utilities, supports apt cache, and allows for the installation of additional packages. It is designed for ephemeral containers, ensuring data persistence, and offers volumes for data, configuration, and notebooks. Common tasks include setting up the server, managing configurations, setting passwords, listing installed packages, passing parameters to jupyter-lab, running commands in the container, building wheels outside the container, installing dotfiles and SSH keys, and creating SSH tunnels.

github

: 58

FinRobot

FinRobot is an open-source AI agent platform designed for financial applications using large language models. It transcends the scope of FinGPT, offering a comprehensive solution that integrates a diverse array of AI technologies. The platform's versatility and adaptability cater to the multifaceted needs of the financial industry. FinRobot's ecosystem is organized into four layers, including Financial AI Agents Layer, Financial LLMs Algorithms Layer, LLMOps and DataOps Layers, and Multi-source LLM Foundation Models Layer. The platform's agent workflow involves Perception, Brain, and Action modules to capture, process, and execute financial data and insights. The Smart Scheduler optimizes model diversity and selection for tasks, managed by components like Director Agent, Agent Registration, Agent Adaptor, and Task Manager. The tool provides a structured file organization with subfolders for agents, data sources, and functional modules, along with installation instructions and hands-on tutorials.

github

: 1.0k

hands-on-lab-neo4j-and-vertex-ai

This repository provides a hands-on lab for learning about Neo4j and Google Cloud Vertex AI. It is intended for data scientists and data engineers to deploy Neo4j and Vertex AI in a Google Cloud account, work with real-world datasets, apply generative AI, build a chatbot over a knowledge graph, and use vector search and index functionality for semantic search. The lab focuses on analyzing quarterly filings of asset managers with $100m+ assets under management, exploring relationships using Neo4j Browser and Cypher query language, and discussing potential applications in capital markets such as algorithmic trading and securities master data management.

github

: 88

jupyter-quant

Jupyter Quant is a dockerized environment tailored for quantitative research, equipped with essential tools like statsmodels, pymc, arch, py_vollib, zipline-reloaded, PyPortfolioOpt, numpy, pandas, sci-py, scikit-learn, yellowbricks, shap, optuna, and more. It provides Interactive Broker connectivity via ib_async and includes major Python packages for statistical and time series analysis. The image is optimized for size, includes jedi language server, jupyterlab-lsp, and common command line utilities. Users can install new packages with sudo, leverage apt cache, and bring their own dot files and SSH keys. The tool is designed for ephemeral containers, ensuring data persistence and flexibility for quantitative analysis tasks.

github

: 165

Qbot

Qbot is an AI-oriented automated quantitative investment platform that supports diverse machine learning modeling paradigms, including supervised learning, market dynamics modeling, and reinforcement learning. It provides a full closed-loop process from data acquisition, strategy development, backtesting, simulation trading to live trading. The platform emphasizes AI strategies such as machine learning, reinforcement learning, and deep learning, combined with multi-factor models to enhance returns. Users with some Python knowledge and trading experience can easily utilize the platform to address trading pain points and gaps in the market.

github

: 7.0k

FinMem-LLM-StockTrading

github

: 220

LLMs-in-Finance

This repository focuses on the application of Large Language Models (LLMs) in the field of finance. It provides insights and knowledge about how LLMs can be utilized in various scenarios within the finance industry, particularly in generating AI agents. The repository aims to explore the potential of LLMs to enhance financial processes and decision-making through the use of advanced natural language processing techniques.

github

: 327