
Awesome-Neuro-Symbolic-Learning-with-LLM
β¨β¨Latest Advances on Neuro-Symbolic Learning in the era of Large Language Models
Stars: 53

The Awesome-Neuro-Symbolic-Learning-with-LLM repository is a curated collection of papers and resources focusing on improving reasoning and planning capabilities of Large Language Models (LLMs) and Multi-Modal Large Language Models (MLLMs) through neuro-symbolic learning. It covers a wide range of topics such as neuro-symbolic visual reasoning, program synthesis, logical reasoning, mathematical reasoning, code generation, visual reasoning, geometric reasoning, classical planning, game AI planning, robotic planning, AI agent planning, and more. The repository provides a comprehensive overview of tutorials, workshops, talks, surveys, papers, datasets, and benchmarks related to neuro-symbolic learning with LLMs and MLLMs.
README:
β¨β¨ Curated collection of papers and resources on latest advances on improving reasoning and planning abilities of LLM/MLLMs with neuro-symbolic learning
ποΈ Table of Contents
- Neuro-Symbolic Visual Reasoning and Program Synthesis Tutorials in CVPR 2020
- Neuro-Symbolic Methods for Language and Vision Tutorials in AAAI 2022
- AAAI 2022 Tutorial on AI Planning: Theory and Practice Tutorials in AAAI 2022
- Advances in Neuro Symbolic Reasoning and Learning Tutorials in AAAI 2023
- Neuro-Symbolic Approaches: Large Language Models + Tool Use Tutorials in ACL 2023
- Neuro-Symbolic Generative Models Workshop in ICLR 2023
- Neuro-Symbolic Learning and Reasoning in the Era of Large Language Models Workshop in AAAI 2024
- Neuro-Symbolic Concepts for Robotic Manipulation Talk given by Jiayuan Mao [Video]
- Building General-Purpose Robots with Compositional Action Abstractions Talk given by Jiayuan Mao
- Summer School on Neurosymbolic Programming
- MIT 6.S191: Neuro-Symbolic AI Talk given by David Cox [Video]
- NeuroSymbolic Programming [Slides]
- LLM Reasoning: Key Ideas and Limitations Talk give by Denny Zhou
- Inference-Time Techniques for LLM Reasoning Talk given by Xinyun Chen
- Neurosymbolic Reasoning for Large Language Models Neuro-Symbolic AI Summer School in UCLA, 2024
- Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
- LLM Post-Training: A Deep Dive into Reasoning Large Language Models
- A Survey on Post-training of Large Language Models
- Reasoning Language Models: A Blueprint
- Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
- Logical Reasoning in Large Language Models: A Survey
- From System 1 to System 2: A Survey of Reasoning Large Language Models
- A Survey on LLM Inference-Time Self-Improvement
- Empowering LLMs with Logical Reasoning: A Comprehensive Survey
- Advancing Reasoning in Large Language Models: Promising Methods and Approaches
- A Survey on Deep Learning for Theorem Proving
- A Survey of Mathematical Reasoning in the Era of Multi-Modal Large Language Model: Benchmark, Method & Challenges
- Multi-Modal Chain-of-Thought Reasoning:A Comprehensive Survey
- Exploring the Reasoning Abilities of Multi-Modal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
- A Survey on Large Language Models for Automated Planning
- A Survey of Optimization-based Task and Motion Planning: From Classical To Learning Approaches
- A Survey on Large Language Model based Autonomous Agents
- Understanding the planning of LLM agents: A survey
- Introduction to AI Planning
- A Survey on Neural-symbolic Learning Systems
- Towards Cognitive AI Systems: a Survey and Prospective on Neuro-Symbolic AI
- Bridging the Gap: Representation Spaces in Neuro-Symbolic AI
- Neuro-Symbolic AI: The 3rd Wave
- Neuro-Symbolic AI and its Taxonomy: A Survey
- The third AI summer: AAAI Robert S. Engelmore Memorial Lecture
- NeuroSymbolic AI - Why, What, and How
- From Statistical Relational to Neuro-Symbolic Artificial Intelligence: a Survey
- Neuro-Symbolic Artificial Intelligence: Current Trends
- Neuro-Symbolic Reinforcement Learning and Planning: A Survey
- A Review on Neuro-symbolic AI Improvements to Natural Language Processing
- Survey on Applications of NeuroSymbolic Artificial Intelligence
- Overview of Neuro-Symbolic Integration Frameworks
Title | Venue | Date | Domain | Code |
---|---|---|---|---|
AMR-DA: Data Augmentation by Abstract Meaning Representation |
ACL | 2022 | Logic Reasoning | Github |
Abstract Meaning Representation-Based Logic-Driven Data Augmentation for Logical Reasoning |
ACL | 2024 | Logic Reasoning | Github |
Neuro-Symbolic Data Generation for Math Reasoning |
NeurIPS | 2024 | Math Reasoning | - |
LawGPT: Knowledge-Guided Data Generation and Its Application to Legal LLM |
SCI-FM Workshop @ ICLR | 2025 | Legal Reasoning | Github |
AlphaIntegrator: Transformer Action Search for Symbolic Integration Proofs |
Arxiv | 2024 | Theorem Proving | - |
Title | Venue | Date | Domain | Code |
---|---|---|---|---|
PAL: Program-aided Language Models |
ICML | 2023 | Reasoning | Github |
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks |
TMLR | 2023 | Math Reasoning | Github |
Binding Language Models in Symbolic Languages |
ICLR | 2023 | Reasoning | Github |
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator | ICML | 2024 | Reasoning | Github |
CODE4STRUCT: Code Generation for Few-Shot Event Structure Prediction | ACL | 2023 | Reasoning | Github |
MathPrompter: Mathematical Reasoning using Large Language Models | ACL | 2023 | Math Reasoning | Github |
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning |
ACL | 2024 | Reasoning | Github |
Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments | Arxiv | 2025 | Reasoning | - |
Code as Policies: Language Model Programs for Embodied Control |
Arxiv | 2023 | Robotics | Github |
Title | Venue | Date | Domain | Code |
---|---|---|---|---|
Neuro-Symbolic Visual Reasoning: Disentangling βVisualβ from βReasoningβ |
ICML | 2020 | Visual Reasoning | Github |
JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents |
Arxiv | 2022 | Robotics | - |
What's Left? Concept Grounding with Logic-Enhanced Foundation Models |
NeurIPS | 2023 | Visual Reasoning | Github |
Take A Step Back: Rethinking the Two Stages in Visual Reasoning |
ECCV | 2024 | Visual Reasoning | Github |
DiLA: Enhancing LLM Tool Learning with Differential Logic Layer |
Arxiv | 2024 | Reasoning | - |
Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks |
ICLR | 2024 | Reasoning | Github |
Empowering Language Models with Knowledge Graph Reasoning for Question Answering |
EMNLP | 2022 | Reasoning | - |
Neuro-symbolic Training for Spatial Reasoning over Natural Language |
Arxiv | 2025 | Spatial Reasoning | Github |
NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization |
Arxiv | 2024 | Visual Reasoning | Github |
Title | Venue | Date | Domain | Code |
---|---|---|---|---|
CoTran: An LLM-based Code Translator using Reinforcement Learning with Feedback from Compiler and Symbolic Execution |
ECAI | 2024 | Code Generation | - |
Position: LLMs Canβt Plan, But Can Help Planning in LLM-Modulo Frameworks |
ICML | 2024 | Planning | - |
RLSF: Reinforcement Learning via Symbolic Feedback |
Arxiv | 2025 | Reasoning | Github |
- GSM8K, MATH, AIME, OlympiadBench, MiniF2F, GSM Symbolic, MWPBench, AMC, AddSub, MathQA, FIMO, TRIGO, U-MATH, Mario, MultiArith, CHAMP, ARB, LeanDojo, LISA, PISA, TheoremQA, FrontierMath, Functional, TABMWP, SCIBENCH, MultiHiertt, ChartQA
- LogicGame, LogiQA, LogiQA-v2.0, PrOntoQA, ProofWriter, BigBench, FOLIO, AbductionRules, ARC Challenge, WANLI, CLUTRR, Adversarial NLI, Adversarial ARCT
- Visual Sudoku, CLEVR Dataset, GQA Dataset, VQA & VQA v2.0, Flickr30k entities, DAQUAR, Visual Genome, Visual7W, COCO-QA, TDIUC, SHAPES, VQA-Rephrasings, VQA P2, VQA-HAT, VQA-X, VQA-E, TallyQA, ST-VQA, Text-VQA, FVQA, OK-VQA
- Atari 100k, Procgen, Gym Retro, MalmΓΆ, Obstacle Tower, Torcs, DeepMind Lab, Hard Eight, DeepMind Control, VizDoom, Pommerman, Multiagent emergence, Google Research Football, Neural MMOs, StarCraft II, PySC2, Fever Basketball
- Mini-Behavior, CLIPort Dataset, ALFworld, VirtualHome, RocoBench, Behavior, SMART-LLM, PPNL, Robotouille
- WebArena, OSWorld, API-Bank, TravelPlanner, ChinaTravel, TaskBench, WebShop, AgentBench, AgentGym, AgentBoard, GAIA, MINT
- RSbench: A Neuro-Symbolic Benchmark Suite for Concept Quality and Reasoning Shortcuts
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-Neuro-Symbolic-Learning-with-LLM
Similar Open Source Tools

Awesome-Neuro-Symbolic-Learning-with-LLM
The Awesome-Neuro-Symbolic-Learning-with-LLM repository is a curated collection of papers and resources focusing on improving reasoning and planning capabilities of Large Language Models (LLMs) and Multi-Modal Large Language Models (MLLMs) through neuro-symbolic learning. It covers a wide range of topics such as neuro-symbolic visual reasoning, program synthesis, logical reasoning, mathematical reasoning, code generation, visual reasoning, geometric reasoning, classical planning, game AI planning, robotic planning, AI agent planning, and more. The repository provides a comprehensive overview of tutorials, workshops, talks, surveys, papers, datasets, and benchmarks related to neuro-symbolic learning with LLMs and MLLMs.

Awesome-Model-Merging-Methods-Theories-Applications
A comprehensive repository focusing on 'Model Merging in LLMs, MLLMs, and Beyond', providing an exhaustive overview of model merging methods, theories, applications, and future research directions. The repository covers various advanced methods, applications in foundation models, different machine learning subfields, and tasks like pre-merging methods, architecture transformation, weight alignment, basic merging methods, and more.

Awesome-LLM-Large-Language-Models-Notes
Awesome-LLM-Large-Language-Models-Notes is a repository that provides a comprehensive collection of information on various Large Language Models (LLMs) classified by year, size, and name. It includes details on known LLM models, their papers, implementations, and specific characteristics. The repository also covers LLM models classified by architecture, must-read papers, blog articles, tutorials, and implementations from scratch. It serves as a valuable resource for individuals interested in understanding and working with LLMs in the field of Natural Language Processing (NLP).

awesome-mobile-llm
Awesome Mobile LLMs is a curated list of Large Language Models (LLMs) and related studies focused on mobile and embedded hardware. The repository includes information on various LLM models, deployment frameworks, benchmarking efforts, applications, multimodal LLMs, surveys on efficient LLMs, training LLMs on device, mobile-related use-cases, industry announcements, and related repositories. It aims to be a valuable resource for researchers, engineers, and practitioners interested in mobile LLMs.

Awesome-LLM-Constrained-Decoding
Awesome-LLM-Constrained-Decoding is a curated list of papers, code, and resources related to constrained decoding of Large Language Models (LLMs). The repository aims to facilitate reliable, controllable, and efficient generation with LLMs by providing a comprehensive collection of materials in this domain.

Awesome-LLM-Safety
Welcome to our Awesome-llm-safety repository! We've curated a collection of the latest, most comprehensive, and most valuable resources on large language model safety (llm-safety). But we don't stop there; included are also relevant talks, tutorials, conferences, news, and articles. Our repository is constantly updated to ensure you have the most current information at your fingertips.

ZhiLight
ZhiLight is a highly optimized large language model (LLM) inference engine developed by Zhihu and ModelBest Inc. It accelerates the inference of models like Llama and its variants, especially on PCIe-based GPUs. ZhiLight offers significant performance advantages compared to mainstream open-source inference engines. It supports various features such as custom defined tensor and unified global memory management, optimized fused kernels, support for dynamic batch, flash attention prefill, prefix cache, and different quantization techniques like INT8, SmoothQuant, FP8, AWQ, and GPTQ. ZhiLight is compatible with OpenAI interface and provides high performance on mainstream NVIDIA GPUs with different model sizes and precisions.

Hands-On-LangChain-for-LLM-Applications-Development
Practical LangChain tutorials for developing LLM applications, including prompt templates, output parsing, chatbots memory, chains, evaluating applications, building agents using LangChain & OpenAI API, retrieval augmented generation with LangChain, documents loading, splitting, vector database & text embeddings, information retrieval, answering questions from documents, chat with files, and introduction to Open AI function calling.

TrustLLM
TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.

YuLan-Mini
YuLan-Mini is a lightweight language model with 2.4 billion parameters that achieves performance comparable to industry-leading models despite being pre-trained on only 1.08T tokens. It excels in mathematics and code domains. The repository provides pre-training resources, including data pipeline, optimization methods, and annealing approaches. Users can pre-train their own language models, perform learning rate annealing, fine-tune the model, research training dynamics, and synthesize data. The team behind YuLan-Mini is AI Box at Renmin University of China. The code is released under the MIT License with future updates on model weights usage policies. Users are advised on potential safety concerns and ethical use of the model.

PredictorLLM
PredictorLLM is an advanced trading agent framework that utilizes large language models to automate trading in financial markets. It includes a profiling module to establish agent characteristics, a layered memory module for retaining and prioritizing financial data, and a decision-making module to convert insights into trading strategies. The framework mimics professional traders' behavior, surpassing human limitations in data processing and continuously evolving to adapt to market conditions for superior investment outcomes.

data-prep-kit
Data Prep Kit accelerates unstructured data preparation for LLM app developers. It allows developers to cleanse, transform, and enrich unstructured data for pre-training, fine-tuning, instruct-tuning LLMs, or building RAG applications. The kit provides modules for Python, Ray, and Spark runtimes, supporting Natural Language and Code data modalities. It offers a framework for custom transforms and uses Kubeflow Pipelines for workflow automation. Users can install the kit via PyPi and access a variety of transforms for data processing pipelines.

CogVLM2
CogVLM2 is a new generation of open source models that offer significant improvements in benchmarks such as TextVQA and DocVQA. It supports 8K content length, image resolution up to 1344 * 1344, and both Chinese and English languages. The project provides basic calling methods, fine-tuning examples, and OpenAI API format calling examples to help developers quickly get started with the model.

Prompt-Engineering-Holy-Grail
The Prompt Engineering Holy Grail repository is a curated resource for prompt engineering enthusiasts, providing essential resources, tools, templates, and best practices to support learning and working in prompt engineering. It covers a wide range of topics related to prompt engineering, from beginner fundamentals to advanced techniques, and includes sections on learning resources, online courses, books, prompt generation tools, prompt management platforms, prompt testing and experimentation, prompt crafting libraries, prompt libraries and datasets, prompt engineering communities, freelance and job opportunities, contributing guidelines, code of conduct, support for the project, and contact information.

LlamaV-o1
LlamaV-o1 is a Large Multimodal Model designed for spontaneous reasoning tasks. It outperforms various existing models on multimodal reasoning benchmarks. The project includes a Step-by-Step Visual Reasoning Benchmark, a novel evaluation metric, and a combined Multi-Step Curriculum Learning and Beam Search Approach. The model achieves superior performance in complex multi-step visual reasoning tasks in terms of accuracy and efficiency.
For similar tasks

byteir
The ByteIR Project is a ByteDance model compilation solution. ByteIR includes compiler, runtime, and frontends, and provides an end-to-end model compilation solution. Although all ByteIR components (compiler/runtime/frontends) are together to provide an end-to-end solution, and all under the same umbrella of this repository, each component technically can perform independently. The name, ByteIR, comes from a legacy purpose internally. The ByteIR project is NOT an IR spec definition project. Instead, in most scenarios, ByteIR directly uses several upstream MLIR dialects and Google Mhlo. Most of ByteIR compiler passes are compatible with the selected upstream MLIR dialects and Google Mhlo.

ScandEval
ScandEval is a framework for evaluating pretrained language models on mono- or multilingual language tasks. It provides a unified interface for benchmarking models on a variety of tasks, including sentiment analysis, question answering, and machine translation. ScandEval is designed to be easy to use and extensible, making it a valuable tool for researchers and practitioners alike.

opencompass
OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features include: * Comprehensive support for models and datasets: Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 70+ datasets with about 400,000 questions, comprehensively evaluating the capabilities of the models in five dimensions. * Efficient distributed evaluation: One line command to implement task division and distributed evaluation, completing the full evaluation of billion-scale models in just a few hours. * Diversified evaluation paradigms: Support for zero-shot, few-shot, and chain-of-thought evaluations, combined with standard or dialogue-type prompt templates, to easily stimulate the maximum performance of various models. * Modular design with high extensibility: Want to add new models or datasets, customize an advanced task division strategy, or even support a new cluster management system? Everything about OpenCompass can be easily expanded! * Experiment management and reporting mechanism: Use config files to fully record each experiment, and support real-time reporting of results.

openvino.genai
The GenAI repository contains pipelines that implement image and text generation tasks. The implementation uses OpenVINO capabilities to optimize the pipelines. Each sample covers a family of models and suggests certain modifications to adapt the code to specific needs. It includes the following pipelines: 1. Benchmarking script for large language models 2. Text generation C++ samples that support most popular models like LLaMA 2 3. Stable Diffuison (with LoRA) C++ image generation pipeline 4. Latent Consistency Model (with LoRA) C++ image generation pipeline

GPT4Point
GPT4Point is a unified framework for point-language understanding and generation. It aligns 3D point clouds with language, providing a comprehensive solution for tasks such as 3D captioning and controlled 3D generation. The project includes an automated point-language dataset annotation engine, a novel object-level point cloud benchmark, and a 3D multi-modality model. Users can train and evaluate models using the provided code and datasets, with a focus on improving models' understanding capabilities and facilitating the generation of 3D objects.

octopus-v4
The Octopus-v4 project aims to build the world's largest graph of language models, integrating specialized models and training Octopus models to connect nodes efficiently. The project focuses on identifying, training, and connecting specialized models. The repository includes scripts for running the Octopus v4 model, methods for managing the graph, training code for specialized models, and inference code. Environment setup instructions are provided for Linux with NVIDIA GPU. The Octopus v4 model helps users find suitable models for tasks and reformats queries for effective processing. The project leverages Language Large Models for various domains and provides benchmark results. Users are encouraged to train and add specialized models following recommended procedures.

Awesome-LLM-RAG
This repository, Awesome-LLM-RAG, aims to record advanced papers on Retrieval Augmented Generation (RAG) in Large Language Models (LLMs). It serves as a resource hub for researchers interested in promoting their work related to LLM RAG by updating paper information through pull requests. The repository covers various topics such as workshops, tutorials, papers, surveys, benchmarks, retrieval-enhanced LLMs, RAG instruction tuning, RAG in-context learning, RAG embeddings, RAG simulators, RAG search, RAG long-text and memory, RAG evaluation, RAG optimization, and RAG applications.

stm32ai-modelzoo
The STM32 AI model zoo is a collection of reference machine learning models optimized to run on STM32 microcontrollers. It provides a large collection of application-oriented models ready for re-training, scripts for easy retraining from user datasets, pre-trained models on reference datasets, and application code examples generated from user AI models. The project offers training scripts for transfer learning or training custom models from scratch. It includes performances on reference STM32 MCU and MPU for float and quantized models. The project is organized by application, providing step-by-step guides for training and deploying models.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.