llm-reasoners
A library for advanced large language model reasoning
Stars: 1021
LLM Reasoners is a library that enables LLMs to conduct complex reasoning, with advanced reasoning algorithms. It approaches multi-step reasoning as planning and searches for the optimal reasoning chain, which achieves the best balance of exploration vs exploitation with the idea of "World Model" and "Reward". Given any reasoning problem, simply define the reward function and an optional world model (explained below), and let LLM reasoners take care of the rest, including Reasoning Algorithms, Visualization, LLM calling, and more!
README:
[Home] [Paper (COLM2024)] [Blog]
LLM Reasoners is a library to enable LLMs to conduct complex reasoning, with advanced reasoning algorithms. It approaches multi-step reasoning as planning and searches for the optimal reasoning chain, which achieves the best balance of exploration vs exploitation with the idea of "World Model" and "Reward".
Given any reasoning problem, simply define the reward function and an optional world model (explained below), and let LLM reasoners take care of the rest, including Reasoning Algorithms, Visualization, LLM calling, and more!
-
Jul. 10, 2024: The paper of LLM Reasoners is accepted to COLM 2024!
-
Jun. 24, 2024: PromptAgent is in LLM Reasoners! Let it help you write down a super detailed prompt for your task (here).
-
May. 14, 2024: Check out Eurus, a suit of LLMs optimized for reasoning. With LLM Reasoners, Eurus-RM can easily boost Llama-8B from 0.49 to 0.73 📈 on GSM8k (code).
-
May. 2, 2024: We have integrated our first reasoning method for scientific reasoning, StructChem! Check it out here.
-
Apr. 22, 2024: We integrated Llama-3, with additional useful APIs (e.g., customizing EOS tokens, calculating likelihood)
-
Apr. 8, 2024: Our new paper introducing LLM Reasoners is available!
-
Mar. 29, 2024: Grace Decoding has been incoporated!
-
Oct. 25, 2023: A video tutorial on the visualizer of LLM Reasoners are available.
-
Oct. 23, 2023: Reasoning-via-Planning is accepted to EMNLP 2023! Check our paper with updated results and discussion!
-
Aug. 21, 2023: A batch of quantized Llama-2 models has arrived! BitsandBytes with huggingface API, GPT-Q with exllama are available. Now you can try llama-2-70B with 2 x 24G GPUs.
-
Aug. 10, 2023: Llama-2 is supported! You can run examples with Llama-2 now.
-
Cutting-Edge Reasoning Algorithms: We offer the most up-to-date search algorithms for reasoning with LLMs, such as:
- Reasoning-via-Planning, MCTS (Hao et al., 2023)
- StructChem (Ouyang et al., 2023)
- Chain-of-thoughts (Wei et al., 202)
- Least-to-most prompting (Zhou et al., 2022)
- Tree-of-Thoughts, BFS (Yao et al., 2023)
- Tree-of-Thoughts, DFS (Yao et al., 2023)
- Self-Eval Guided Decoding, Beam Search (Xie et al., 2023)
- Grace Decoding (Khalifa et al., 2023)
- Eurus (Yuan et al., 2024)
- PromptAgent (Wang et al., 2023)
-
Intuitive Visualization and Interpretation: Our library provides a visualization tool to aid users in comprehending the reasoning process. Even for complex reasoning algorithms like Monte-Carlo Tree Search, users can easily diagnose and understand the process with one line of python code. See an exmaple in the tutorial notebook.
-
Compatibility with popular LLM libraries: Our framework is compatible with popular LLM frameworks, e.g.
Huggingface transformers,OpenAI/Google/AnthropicAPI, etc. Specifically, we have integrated LLaMA-1/2/3 with the option of usingfairscale(1,2, 3), LLaMA.cpp, Exllama orhuggingfacefor different needs, e.g., fastest inference speed, minimal hardware requirements, etc.
-
LLM Reasoners is applied to analyze the reasoning abilities of LLMs and the performance of multiple reasoning algorithms. See the comprehensive experiment results in the AutoRace Leaderboard, and more analysis in the blog and paper.
-
It has been tested to successfully reproduce the performance of Tree-of-Thoughts, Guided Decoding and GRACE Decoding with their official implementation. We list the results reported in their paper / reproduced from their official repositories for reference (†). Some results are on the subsets of the first 100 examples (*).
| Method | Base LLM | GSM8k |
|---|---|---|
| Guided Decoding†| CodeX (PAL) | 0.80 |
| Guided Decoding | CodeX (PAL) | 0.83* |
| Method | Base LLM | Game of 24 |
|---|---|---|
| Tree-of-Thoughts†| GPT-3.5-turbo | 0.22 |
| Tree-of-Thoughts | GPT-3.5-turbo | 0.22 |
| Method | Base LLM | GSM8k |
|---|---|---|
| GRACE Decoding†| Flan-T5-Large (Fine-tuned) | 0.34 |
| GRACE Decoding | Flan-T5-Large (Fine-tuned) | 0.33* |
Consider the following problem:
Let's start with a naive method for LLM reasoning: Prompted with a few examples of problem-solving step by step, an LLM can generate a chain of thoughts (or a sequence of actions) to solve a new problem. For the problem above, the prompt inputted to the LLM and the expected output (in bold) is shown below:
I am playing with a set of blocks where I need to arrange the blocks into stacks. (Example problems and solutions * 4) [STATEMENT] As initial conditions I have that, the red block is clear, the blue block is clear, the orange block is clear, the hand is empty, the red block is on the yellow block, the yellow block is on the table, the blue block is on the table and the orange block is on the table. My goal is to have that the orange block is on top of the blue block and the yellow block on top of the orange block. [PLAN] pick up the orange block stack the orange block on top of the blue block unstack the red block from on top of the yellow block put the red block on the table pick up the yellow block stack the yellow block on top of the orange block
Regarding each reasoning step as an action, we have $a_1=$"pick up the orange block", $a_2=$"stack the orange block on top of the blue block", and so on. At each time step, the next action is sampled from the LLM conditioned on the previous actions. This simple method is often referred to as Chain-of-thoughts reasoning. Unfortunately, it doesn't always work for complex reasoning problems. For Blocksworld dataset where the problem above comes from, even the strongest GPT-4 model can only reach the success rate of ~30%.
LLM Reasoners formulate reasoning as planning (RAP). Different from Chain-of-thoughts reasoning which autoregressively samples the next action, our goal is to efficiently search in the reasoning space for the optimal reasoning chain. To achieve this, two components need to be defined: a world model and a reward function.
-
World model defines the state transition, formally $P(s_{i+1} | s_i, a_i)$. A default world model regards the partial solution as the state and simply appends a new action/thought to the state as the state transition (the same formulation of Tree-of-Thoughts). However, you’ll have the option to design a better world model which predicts and keeps track of a more meaningful state (e.g., environment status, intermediate variable values, etc. Check RAP for more examples), thus enhancing the reasoning. For the example shown above, we can naturally define the state as the condition of blocks (e.g., the red block is on the yellow block...), and a world model is to predict the condition of blocks after every potential action.
-
Reward function provides a criterion to evaluate a reasoning step. Ideally, a reasoning chain with a higher accumulated reward should be more likely to be correct. For the example shown above, we can reward actions based on the increased number of accomplished subgoals they lead to. Besides, the likelihood of LLMs generating the action can also be used as a reward, to give the search a good prior.
After we have the world model and reward function, it's time to apply an algorithm to search for the optimal reasoning trace. Here, we show the process of Monte-Carlo Tree Search with a gif:
The three key components in a reasoning algorithm, reward function, world model, and search algorithm in the formulation (top), correspond to three classes in the library, SearchConfig, WorldModel and SearchAlgorithm respectively. Besides, there are LLM APIs to power other modules, Benchmark, and Visualization to evaluate or debug the reasoning algorithm (middle). To implement a reasoning algorithm for a certain domain (a Reasoner object), a user may inherit the SearchConfig and WorldModel class, and import a pre-implemented SearchAlgorithm. We also show a concrete example of solving Blocksworld with RAP using LLM Reasoners (bottom).
Let's go through the code of reasoning over Blocksworld problems. Note that the code is simplified for demonstration (check here for a runnable notebook).
The first step is to define the world model: you will set up an initial state given a question in init_state, judge whether a state is terminal in is_terminal, and most importantly, define the world dynamics with step:
from typing import NamedTuple
import utils
from reasoners import WorldModel, LanguageModel
import copy
BWState = str
BWAction = str
class BlocksWorldModel(WorldModel[BWState, BWAction]):
def __init__(self,
base_model: LanguageModel,
prompt: dict) -> None:
super().__init__()
self.base_model = base_model
self.prompt = prompt
def init_state(self) -> BWState:
# extract the statement from a given problem
# e.g., "the red block is clear, the blue block is clear..."
return BWState(utils.extract_init_state(self.example))
def step(self, state: BWState, action: BWAction) -> tuple[BWState, dict]:
# call the LLM to predict the state transition
state = copy.deepcopy(state)
# load the prompt for the LLM to predict the next state
# e.g. "... I have that <state>, if I <action>, then ..."
world_update_prompt = self.prompt["update"].replace("<state>", state).replace("<action>", action)
world_output = self.base_model.generate([world_update_prompt],
eos_token_id="\n", hide_input=True, temperature=0).text[0].strip()
new_state = utils.process_new_state(world_output)
# till now, we have the new state after the action
# the following part is to speed up the reward calculation
# we want to check the portion of the satisfied subgoals, and use it as a part of the reward
# since we have predicted the new state already, we can just check it here at convenience
goal_reached = utils.goal_check(utils.extract_goals(self.example, new_state))
# return the new state and the additional dictionary (to be passed to the reward function)
return new_state, {"goal_reached": goal_reached}
def is_terminal(self, state: BWState) -> bool:
# define the condition the terminal state to stop the search
# e.g., all the subgoals are met
if utils.goal_check(utils.extract_goals(self.example), state.blocks_state) == 1:
return True
return FalseThen, it's time to consider how to search for the optimal reasoning chain. It involves get_actions to get the action space given a state, and the most important reward as the guidance for reasoning. For Monte-Carlo Tree Search, we can additionally define a fast_reward to speed up the roll-out stage.
import utils
from world_model import BWState, BWAction
from reasoners import SearchConfig, LanguageModel
class BWConfig(SearchConfig):
def __init__(self,
base_model: LanguageModel,
prompt: dict,
reward_alpha=0.5,
goal_reward_default=0.,
goal_reached_reward=100) -> None:
super().__init__()
self.base_model = base_model
self.example = None
self.prompt = prompt
# some parameters to calculate the fast reward or reward (explained below)
self.reward_alpha = reward_alpha
self.goal_reward_default = goal_reward_default
self.goal_reached_reward = goal_reached_reward
def get_actions(self, state: BWState) -> list[BWAction]:
# use a rule-based function to extract all legal actions
return utils.generate_all_actions(state)
def fast_reward(self, state: BWState, action: BWAction) -> tuple[float, dict]:
# build an in-context learning prompt (similar to the one used in Chain-of-thoughts reasoning)
inputs = self.prompt["icl"].replace("<init_state>", state)\
.replace("<goals>", utils.extract_goals(self.example))
# concatenate a candidate action after the prompt, and test its loglikelihood
intuition = self.base_model.get_loglikelihood(inputs, [inputs + action])[0]
# the reward is a combination of intuition and goal satisfaction
# in fast_reward, we skip the calculation of goal satisfaction and use a default value
fast_reward = intuition * self.reward_alpha + self.goal_reward_default * (1 - self.reward_alpha)
# cache some information for the reward calculation later (will be passed to `reward` function)
details = {'intuition': intuition}
return fast_reward, details
def reward(self, state: BWState, action: BWAction,
intuition: float = None,
goal_reached: tuple[bool, float] = None) -> float:
# note that `intuition` (cached in `fast_reward`) and `goal_reached` (cached in `step`) are automatically passed as parameters to this reward function
if goal_reached == 1:
# if the goal state is reached, we will assign a large reward
goal_reward = self.goal_reached_reward
else:
# otherwise assign the reward based on the portion of satisfied subgoals
goal_reward = goal_reached
# the reward is a combination of intuition and goal satisfaction
reward = intuition * self.reward_alpha + goal_reward * (1 - self.reward_alpha)
# return the reward and an additional dictionary (to be saved in the log for visualization later)
return reward, {'intuition': intuition, 'goal_reached': goal_reached}Now, we are ready to apply a reasoning algorithm to solve the problem:
from reasoners.algorithm import MCTS
from reasoners.lm import LLaMAModel
from world_model import BlocksWorldModel
from search_config import BWConfig
llama_model = LLaMAModel(llama_ckpts, llama_size, max_batch_size=1)
with open(prompt_path) as f:
prompt = json.load(f)
world_model = BlocksWorldModel(base_model=base_model, prompt=prompt)
config = BWConfig(base_model=llama_model, prompt=prompt)
# save the history of every iteration for visualization
search_algo = MCTS(output_trace_in_each_iter=True)
reasoner = Reasoner(world_model=world_model, search_config=config, search_algo=search_algo)
for i, example in enumerate(dataset):
algo_output = reasoner(example)
# save the MCTS results as pickle files
with open(os.path.join(log_dir, 'algo_output', f'{resume + i + 1}.pkl'), 'wb') as f:
pickle.dump(algo_output, f)Finally, we can easily visualize the reasoning process:
import pickle
from reasoners.visualization import visualize
with open("logs/bw_MCTS/xxx/algo_output/1.pkl", 'rb') as f:
mcts_result = pickle.load(f)
from reasoners.visualization.tree_snapshot import NodeData
from reasoners.algorithm.mcts import MCTSNode
# by default, a state will be presented along with the node, and the reward with saved dictionary in `SearchConfig.reward` will be presented along with the edge.
# we can also define a helper function to customize what we want to see in the visualizer.
def blocksworld_node_data_factory(n: MCTSNode) -> NodeData:
return NodeData({"block state": n.state.blocks_state if n.state else None,
"satisfied": n.fast_reward_details if n.fast_reward_details else "Not expanded"})
def blocksworld_edge_data_factory(n: MCTSNode) -> EdgeData:
return EdgeData({"reward": n.reward, "intuition": n.fast_reward_details["intuition"]})
visualize(mcts_result, node_data_factory=blocksworld_node_data_factory,
edge_data_factory=blocksworld_edge_data_factory)Then a URL of the visualized results will pop up. The figure will be interactive and look like the examples shown on our demo website.
Make sure to use Python 3.10 or later.
conda create -n reasoners python=3.10
conda activate reasonersClone the repository and install the package:
git clone https://github.com/Ber666/llm-reasoners --recursive
cd llm-reasoners
pip install -e .Adding --recursive will help you clone exllama automatically. Note that some other optional modules may need other dependencies. Please refer to the error message for details.
This project is an extension of the following paper:
@inproceedings{hao2023reasoning,
title={Reasoning with Language Model is Planning with World Model},
author={Hao, Shibo and Gu, Yi and Ma, Haodi and Hong, Joshua and Wang, Zhen and Wang, Daisy and Hu, Zhiting},
booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
pages={8154--8173},
year={2023}
}
@article{hao2024llm,
title={LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models},
author={Hao, Shibo and Gu, Yi and Luo, Haotian and Liu, Tianyang and Shao, Xiyan and Wang, Xinyuan and Xie, Shuhua and Ma, Haodi and Samavedhi, Adithya and Gao, Qiyue and others},
journal={arXiv preprint arXiv:2404.05221},
year={2024}
}For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for llm-reasoners
Similar Open Source Tools
llm-reasoners
LLM Reasoners is a library that enables LLMs to conduct complex reasoning, with advanced reasoning algorithms. It approaches multi-step reasoning as planning and searches for the optimal reasoning chain, which achieves the best balance of exploration vs exploitation with the idea of "World Model" and "Reward". Given any reasoning problem, simply define the reward function and an optional world model (explained below), and let LLM reasoners take care of the rest, including Reasoning Algorithms, Visualization, LLM calling, and more!
llms
The 'llms' repository is a comprehensive guide on Large Language Models (LLMs), covering topics such as language modeling, applications of LLMs, statistical language modeling, neural language models, conditional language models, evaluation methods, transformer-based language models, practical LLMs like GPT and BERT, prompt engineering, fine-tuning LLMs, retrieval augmented generation, AI agents, and LLMs for computer vision. The repository provides detailed explanations, examples, and tools for working with LLMs.
RLHF-Reward-Modeling
This repository contains code for training reward models for Deep Reinforcement Learning-based Reward-modulated Hierarchical Fine-tuning (DRL-based RLHF), Iterative Selection Fine-tuning (Rejection sampling fine-tuning), and iterative Decision Policy Optimization (DPO). The reward models are trained using a Bradley-Terry model based on the Gemma and Mistral language models. The resulting reward models achieve state-of-the-art performance on the RewardBench leaderboard for reward models with base models of up to 13B parameters.
zshot
Zshot is a highly customizable framework for performing Zero and Few shot named entity and relationships recognition. It can be used for mentions extraction, wikification, zero and few shot named entity recognition, zero and few shot named relationship recognition, and visualization of zero-shot NER and RE extraction. The framework consists of two main components: the mentions extractor and the linker. There are multiple mentions extractors and linkers available, each serving a specific purpose. Zshot also includes a relations extractor and a knowledge extractor for extracting relations among entities and performing entity classification. The tool requires Python 3.6+ and dependencies like spacy, torch, transformers, evaluate, and datasets for evaluation over datasets like OntoNotes. Optional dependencies include flair and blink for additional functionalities. Zshot provides examples, tutorials, and evaluation methods to assess the performance of the components.
pydantic-ai
PydanticAI is a Python agent framework designed to make it less painful to build production grade applications with Generative AI. It is built by the Pydantic Team and supports various AI models like OpenAI, Anthropic, Gemini, Ollama, Groq, and Mistral. PydanticAI seamlessly integrates with Pydantic Logfire for real-time debugging, performance monitoring, and behavior tracking of LLM-powered applications. It is type-safe, Python-centric, and offers structured responses, dependency injection system, and streamed responses. PydanticAI is in early beta, offering a Python-centric design to apply standard Python best practices in AI-driven projects.
gepa
GEPA (Genetic-Pareto) is a framework for optimizing arbitrary systems composed of text components like AI prompts, code snippets, or textual specs against any evaluation metric. It employs LLMs to reflect on system behavior, using feedback from execution and evaluation traces to drive targeted improvements. Through iterative mutation, reflection, and Pareto-aware candidate selection, GEPA evolves robust, high-performing variants with minimal evaluations, co-evolving multiple components in modular systems for domain-specific gains. The repository provides the official implementation of the GEPA algorithm as proposed in the paper titled 'GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning'.
LongBench
LongBench v2 is a benchmark designed to assess the ability of large language models (LLMs) to handle long-context problems requiring deep understanding and reasoning across various real-world multitasks. It consists of 503 challenging multiple-choice questions with contexts ranging from 8k to 2M words, covering six major task categories. The dataset is collected from nearly 100 highly educated individuals with diverse professional backgrounds and is designed to be challenging even for human experts. The evaluation results highlight the importance of enhanced reasoning ability and scaling inference-time compute to tackle the long-context challenges in LongBench v2.
Quantus
Quantus is a toolkit designed for the evaluation of neural network explanations. It offers more than 30 metrics in 6 categories for eXplainable Artificial Intelligence (XAI) evaluation. The toolkit supports different data types (image, time-series, tabular, NLP) and models (PyTorch, TensorFlow). It provides built-in support for explanation methods like captum, tf-explain, and zennit. Quantus is under active development and aims to provide a comprehensive set of quantitative evaluation metrics for XAI methods.
agentscript
AgentScript is an open-source framework for building AI agents that think in code. It prompts a language model to generate JavaScript code, which is then executed in a dedicated runtime with resumability, state persistence, and interactivity. The framework allows for abstract task execution without needing to know all the data beforehand, making it flexible and efficient. AgentScript supports tools, deterministic functions, and LLM-enabled functions, enabling dynamic data processing and decision-making. It also provides state management and human-in-the-loop capabilities, allowing for pausing, serialization, and resumption of execution.
langevals
LangEvals is an all-in-one Python library for testing and evaluating LLM models. It can be used in notebooks for exploration, in pytest for writing unit tests, or as a server API for live evaluations and guardrails. The library is modular, with 20+ evaluators including Ragas for RAG quality, OpenAI Moderation, and Azure Jailbreak detection. LangEvals powers LangWatch evaluations and provides tools for batch evaluations on notebooks and unit test evaluations with PyTest. It also offers LangEvals evaluators for LLM-as-a-Judge scenarios and out-of-the-box evaluators for language detection and answer relevancy checks.
MARS5-TTS
MARS5 is a novel English speech model (TTS) developed by CAMB.AI, featuring a two-stage AR-NAR pipeline with a unique NAR component. The model can generate speech for various scenarios like sports commentary and anime with just 5 seconds of audio and a text snippet. It allows steering prosody using punctuation and capitalization in the transcript. Speaker identity is specified using an audio reference file, enabling 'deep clone' for improved quality. The model can be used via torch.hub or HuggingFace, supporting both shallow and deep cloning for inference. Checkpoints are provided for AR and NAR models, with hardware requirements of 750M+450M params on GPU. Contributions to improve model stability, performance, and reference audio selection are welcome.
airflow-ai-sdk
This repository contains an SDK for working with LLMs from Apache Airflow, based on Pydantic AI. It allows users to call LLMs and orchestrate agent calls directly within their Airflow pipelines using decorator-based tasks. The SDK leverages the familiar Airflow `@task` syntax with extensions like `@task.llm`, `@task.llm_branch`, and `@task.agent`. Users can define tasks that call language models, orchestrate multi-step AI reasoning, change the control flow of a DAG based on LLM output, and support various models in the Pydantic AI library. The SDK is designed to integrate LLM workflows into Airflow pipelines, from simple LLM calls to complex agentic workflows.
LLMLingua
LLMLingua is a tool that utilizes a compact, well-trained language model to identify and remove non-essential tokens in prompts. This approach enables efficient inference with large language models, achieving up to 20x compression with minimal performance loss. The tool includes LLMLingua, LongLLMLingua, and LLMLingua-2, each offering different levels of prompt compression and performance improvements for tasks involving large language models.
kafka-ml
Kafka-ML is a framework designed to manage the pipeline of Tensorflow/Keras and PyTorch machine learning models on Kubernetes. It enables the design, training, and inference of ML models with datasets fed through Apache Kafka, connecting them directly to data streams like those from IoT devices. The Web UI allows easy definition of ML models without external libraries, catering to both experts and non-experts in ML/AI.
minbpe
This repository contains a minimal, clean code implementation of the Byte Pair Encoding (BPE) algorithm, commonly used in LLM tokenization. The BPE algorithm is "byte-level" because it runs on UTF-8 encoded strings. This algorithm was popularized for LLMs by the GPT-2 paper and the associated GPT-2 code release from OpenAI. Sennrich et al. 2015 is cited as the original reference for the use of BPE in NLP applications. Today, all modern LLMs (e.g. GPT, Llama, Mistral) use this algorithm to train their tokenizers. There are two Tokenizers in this repository, both of which can perform the 3 primary functions of a Tokenizer: 1) train the tokenizer vocabulary and merges on a given text, 2) encode from text to tokens, 3) decode from tokens to text. The files of the repo are as follows: 1. minbpe/base.py: Implements the `Tokenizer` class, which is the base class. It contains the `train`, `encode`, and `decode` stubs, save/load functionality, and there are also a few common utility functions. This class is not meant to be used directly, but rather to be inherited from. 2. minbpe/basic.py: Implements the `BasicTokenizer`, the simplest implementation of the BPE algorithm that runs directly on text. 3. minbpe/regex.py: Implements the `RegexTokenizer` that further splits the input text by a regex pattern, which is a preprocessing stage that splits up the input text by categories (think: letters, numbers, punctuation) before tokenization. This ensures that no merges will happen across category boundaries. This was introduced in the GPT-2 paper and continues to be in use as of GPT-4. This class also handles special tokens, if any. 4. minbpe/gpt4.py: Implements the `GPT4Tokenizer`. This class is a light wrapper around the `RegexTokenizer` (2, above) that exactly reproduces the tokenization of GPT-4 in the tiktoken library. The wrapping handles some details around recovering the exact merges in the tokenizer, and the handling of some unfortunate (and likely historical?) 1-byte token permutations. Finally, the script train.py trains the two major tokenizers on the input text tests/taylorswift.txt (this is the Wikipedia entry for her kek) and saves the vocab to disk for visualization. This script runs in about 25 seconds on my (M1) MacBook. All of the files above are very short and thoroughly commented, and also contain a usage example on the bottom of the file.
SwiftSage
SwiftSage is a tool designed for conducting experiments in the field of machine learning and artificial intelligence. It provides a platform for researchers and developers to implement and test various algorithms and models. The tool is particularly useful for exploring new ideas and conducting experiments in a controlled environment. SwiftSage aims to streamline the process of developing and testing machine learning models, making it easier for users to iterate on their ideas and achieve better results. With its user-friendly interface and powerful features, SwiftSage is a valuable tool for anyone working in the field of AI and ML.
For similar tasks
llm-reasoners
LLM Reasoners is a library that enables LLMs to conduct complex reasoning, with advanced reasoning algorithms. It approaches multi-step reasoning as planning and searches for the optimal reasoning chain, which achieves the best balance of exploration vs exploitation with the idea of "World Model" and "Reward". Given any reasoning problem, simply define the reward function and an optional world model (explained below), and let LLM reasoners take care of the rest, including Reasoning Algorithms, Visualization, LLM calling, and more!
For similar jobs
LLM-FineTuning-Large-Language-Models
This repository contains projects and notes on common practical techniques for fine-tuning Large Language Models (LLMs). It includes fine-tuning LLM notebooks, Colab links, LLM techniques and utils, and other smaller language models. The repository also provides links to YouTube videos explaining the concepts and techniques discussed in the notebooks.
lloco
LLoCO is a technique that learns documents offline through context compression and in-domain parameter-efficient finetuning using LoRA, which enables LLMs to handle long context efficiently.
camel
CAMEL is an open-source library designed for the study of autonomous and communicative agents. We believe that studying these agents on a large scale offers valuable insights into their behaviors, capabilities, and potential risks. To facilitate research in this field, we implement and support various types of agents, tasks, prompts, models, and simulated environments.
llm-baselines
LLM-baselines is a modular codebase to experiment with transformers, inspired from NanoGPT. It provides a quick and easy way to train and evaluate transformer models on a variety of datasets. The codebase is well-documented and easy to use, making it a great resource for researchers and practitioners alike.
python-tutorial-notebooks
This repository contains Jupyter-based tutorials for NLP, ML, AI in Python for classes in Computational Linguistics, Natural Language Processing (NLP), Machine Learning (ML), and Artificial Intelligence (AI) at Indiana University.
EvalAI
EvalAI is an open-source platform for evaluating and comparing machine learning (ML) and artificial intelligence (AI) algorithms at scale. It provides a central leaderboard and submission interface, making it easier for researchers to reproduce results mentioned in papers and perform reliable & accurate quantitative analysis. EvalAI also offers features such as custom evaluation protocols and phases, remote evaluation, evaluation inside environments, CLI support, portability, and faster evaluation.
Weekly-Top-LLM-Papers
This repository provides a curated list of weekly published Large Language Model (LLM) papers. It includes top important LLM papers for each week, organized by month and year. The papers are categorized into different time periods, making it easy to find the most recent and relevant research in the field of LLM.
self-llm
This project is a Chinese tutorial for domestic beginners based on the AutoDL platform, providing full-process guidance for various open-source large models, including environment configuration, local deployment, and efficient fine-tuning. It simplifies the deployment, use, and application process of open-source large models, enabling more ordinary students and researchers to better use open-source large models and helping open and free large models integrate into the lives of ordinary learners faster.


