EDA-GPT

Automated Data Analysis leveraging llms

Stars: 160

Visit

EDA GPT is an open-source data analysis companion that offers a comprehensive solution for structured and unstructured data analysis. It streamlines the data analysis process, empowering users to explore, visualize, and gain insights from their data. EDA GPT supports analyzing structured data in various formats like CSV, XLSX, and SQLite, generating graphs, and conducting in-depth analysis of unstructured data such as PDFs and images. It provides a user-friendly interface, powerful features, and capabilities like comparing performance with other tools, analyzing large language models, multimodal search, data cleaning, and editing. The tool is optimized for maximal parallel processing, searching internet and documents, and creating analysis reports from structured and unstructured data.

README:

EDA GPT: Your OpenSource Data Analysis Companion

Welcome to EDA GPT, your comprehensive solution for all your data analysis needs. Whether you're analyzing structured data in CSV, XLSX, or SQLite formats, generating insightful graphs, or conducting in-depth analysis of unstructured data such as PDFs and images, EDA GPT is here to assist you every step of the way.

Introduction

EDA GPT streamlines the data analysis process, allowing users to effortlessly explore, visualize, and gain insights from their data. With a user-friendly interface and powerful features, EDA GPT empowers users to make data-driven decisions with confidence.

DEMO VIDEO : https://genny.lovo.ai/share/d6b58f0d-fc46-4aa7-a65e-fa0f9a684f01

Getting Started

To get started with EDA GPT, simply navigate to the app and follow the on-screen instructions. Upload your data, specify your analysis preferences, and let EDA GPT handle the rest. With its intuitive interface and powerful features, EDA GPT makes data analysis accessible to users of all skill levels.

How to Use the App

Structured Data Analysis:
- Analyze structured data by uploading files or connecting to databases like PostgreSQL. Supports csv,xlxs & sqlite
- Provide additional context about your data and elaborate on desired outcomes for more accurate analysis.
Graph Generation:
- Generate various types of graphs effortlessly by specifying clear instructions.
- Access the generated code for fine-tuning and customization.
Analysis Questions:
- Post initial EDA, ask analysis questions atop the generated report.
- Gain insights through Plotly graphs and visualization reports.

Comparison of Performance:

Compare the performance of EDA GPT & pandasai based on accuracy, speed, and handling complex queries.

xychart-beta
 title "Comparison of EDA GPT(blue) and PandasAI Performance(green)"
 x-axis ["Accuracy", "Speed", "Complex Queries"]
 y-axis "Score (out of 100)" 0 --> 100
 bar EDA_GPT [90, 92, 90]
 bar PandasAI [85, 90, 70]

LLMs (Large Language Models):
- Choose from a variety of LLMs based on dataset characteristics. Supports HuggingFace,Openai,Groq,Gemini models. Claude3 & GPT4 is available for paid members.
- Consider factors such as dataset size and analysis complexity when selecting an LLM. Models with large context length tend to work better for larger datasets.
Unstructured Data Analysis:
- Analyze unstructured PDF data efficiently. Table structure and Images are infered from unstructured data for better analysis.
- Provide detailed descriptions to enhance LLM decision-making.
- Has Internet Access and follows action/Observation/Thought principle for solving complex tasks.
Multimodal Search:
- Search answers from diverse sources including Wikipedia, Arxiv, DuckDuckGo, and web scrapers.
- Analyze images with integrated Large vision models.
Data Cleaning and Editing:
- Clean and edit your data using various methods provided by EDA GPT.
- Benefit from automated data cleaning processes, saving time and effort.

Key Features:

Capable of analyzing impressive volume of structured and unstructured data.
Unstructured data like audio files, pdfs, images can be analyzed. Youtube video can be analyzed as well for summarizing content.
Special class called Lang Group Chain is designed to handle complex queries. It is currently unstable but the architecture is useful and can be enhanced upon. It essentially breaks down a primary question into subquestions represented as nodes. Each node have some dependency or codependency. Special data structures called LangGroups stores these Lang Nodes. These are sorted in topological order and grouped on basis of same indegree. Each group is passed to llm with previous context to iteratively reach the answer. This kind of architecture is useful in questions like : Find M//3 + 2 where M is age difference between Donald Trump and Joe Biden plus the years taken for pluto to complete one revolution. Notice we need to form sequence of well defined steps to solve this like humans do. This costs more llm calls.
Advanced rag like multiquery and context filtering is used to get better results. Tables are extracted while making embeddings if any.
In Structured EDA GPT section you are provided with interactive visualizations, pygwalker integration, context rich analysis report.
You can talk to EDA GPT and ask it to generate visuals, compute complex queries on dataframe, derive insights, see relationships between features and more. ALl with natural language.
A wide range of llms are supported and keeping privacy in mind, one can use ollama models for offline analysis.
Autoclean is implemented to clean data based on various parameters like linear-regression.
Classificatio models are used for faster inference instead of using llms for explicit classification wherever it's needed.

NOTE : It is advised to provide context rich data manually to the llm before analysis for better results after it is done.

RECOMMENDATIONS : Gemini, OpenAI, Claude3 & LLAMA 3 models work better than most other models.

System Architecture

Structured Data EDA

graph TB
   
   subgraph STRUCTURED-DATA-ANALYZER

      DATA(UPLOAD STRUCTURED DATA) --> analyze(ANALYZE) -- llm analyzes --> EDA(Initial EDA Report)
      detail[Deals With Relational Data]
   end
   

   subgraph VStore
      vstore[(VectorEmbeddings)]
      includes([FAISS vstore])
   end
   EDA(Initial EDA Report)-->docs(DOCUMENT STORE)

   subgraph CALLING-LLM-LLMCHAIN
      prompttemplate(prompts)-->docschain(create-stuff-docs-chain)
      llm(llm choice)-->docschain(create-stuff-docs-chain)
      vstore[(VectorEmbeddings)] -- returns embeddings --> retriever(embeddings as-retriever) -->retrieverchain(retriever-chain--->retrieves vstore embeddings)

      docschain(create-stuff-docs-chain)-->retrieverchain(retriever-chain--->retrieves vstore embeddings) --> Chain(chain-->chain.invoke) --> result(LLM ANSWER)
   end

   subgraph VSTORE-INTERNALS
      coderag([coding examples for rag])-->docs(DOCUMENT STORE)
       docs(DOCUMENT STORE)--preprocess-->preprocessing([splitting,chunking,infer tables, structure in text data])
       preprocessing--embeddings-->embed&save(save to vstore)--save-->vstore[(VectorEmbeddings)]
   end

   
   
   
   subgraph EDAGPT-CHAT_INTERFACE
      subgraph CHAT
         chatinterface(Talk to EDA GPT) -- user-asks-question --> Q&A[Q&A Interface runs] --> function(pandasaichattool)
         function(pandasaichattool) -- create-stuff-docs-chain-creates-request --> vstore[(VectorEmbeddings)]
      end
   end
    subgraph CODE CORRECTOR
      error&query[Combine Error And Query]--into prompt-->correctorllm(SMARTLLMCHAIN)-->method[Chain OF Thoughts]
      method[Chain OF Thoughts]-->corrected(LLM CORRECTION)


   end
   
   


   subgraph OUTPUT_CLASSIFIER
       result(LLM ANSWER)--->Clf(Classification Model)
       models(Models: Random Forest, Naive Bayes)
       Clf(Classification Model)--label:sentence-->sentence(display result)
       Clf(Classification Model)--label:code-->code(code parser)-->codeformatter(CODE-FORMATTER)
       corrected(LLM CORRECTION)-->code(code parser)
   end

   
   subgraph CODE PARSER
   codeformatter(CODE-FORMATTER)--formats code-->exe(Executor)--no error-->output(returns code + output)-->display(display code
 result)
 exe(Executor)--error-->error(if Error)-->error&query[Combine Error And Query]

   end

Unstructured Data EDA

graph TB
   
   subgraph UNSTRUCTURED-DATA-ANALYZER

      pdf(UPLOAD PDF) --> checkpdf(pdf content check)
      image(UPLOAD IMAGE) --> checkimg(image content check)
      checkpdf & checkimg -- |if Valid content| --> embeddings(make-vector embeddings)
      detail[Deals With Unstructured Data]
   end
   

   subgraph VectorStore
      vstore[(VectorEmbeddings)]
      includes([FAISS vstore])
   end

   subgraph CALLING-LLM-LLMCHAIN
      prompttemplate(prompts)-->docschain(create-stuff-docs-chain)
      llm(llm choice)-->docschain(create-stuff-docs-chain)
      chat_history(chat history)-->docschain(create-stuff-docs-chain)
      vstore[(VectorEmbeddings)] -- returns embeddings --> retriever(embeddings as-retriever) -->retrieverchain(retriever-chain--->retrieves vstore embeddings)
      multiquery([MultiQuery Retriever--> generates diverse questions for retrieval])-->retrieverchain
      docschain(create-stuff-docs-chain)-->retrieverchain(retriever-chain--->retrieves vstore embeddings) --> Chain(chain-->chain.invoke) --> result(LLM ANSWER)
   end

   subgraph VSTORE-INTERNALS

      embeddings(make-vector embeddings)--|check for structured data|-->infer-structure([INFER TABLE STRUCTURE if present])--save_too-->docs(DOCUMENT STORE)
       docs(DOCUMENT STORE)--preprocess-->preprocessing([splitting,chunking,infer tables, structure in text data])
       preprocessing--embeddings-->embed&save(save to vstore)--save-->vstore[(VectorEmbeddings)]
   end
   
   subgraph EDAGPT-CHAT_INTERFACE
      subgraph CHAT
         chatinterface(Talk to DATA) -- user-asks-question --> Q&A[Q&A Interface runs] --> clf(Classification Model)--|user-question|-->models

         subgraph MultiClassModels
         models(Models: Random Forest, Naive Bayes)--class-->analysis[Analysis]
         models(Models: Random Forest, Naive Bayes)--class-->vision[Vision]
         models(Models: Random Forest, Naive Bayes)--class-->search[Search]
         end
         
      end

      subgraph Analysis
      analysis[Analysis]-->datanalyst([ANSWERS QUESTION FROM DOCS])
      datanalyst--requests-->vstore-->docschain(create-stuff-docs-chain)
      end
      subgraph Vision
      vision[Vision]-->multimodal-LLM(MultiModal-LLM)-->result
      end
      subgraph SearchAgent
      search[Search]-->multimodalsearch[Multimodal-Search Agent]-->agents
      end
   end
   subgraph Agents
   agents-->funcs{Capabilities}

   subgraph features
   funcs-->internet([Search Internet])-->services([Duckduckgo, Tavily, Google])
   funcs-->scrape([scraper])
   funcs-->findocs([Utilize Docs])-->datanalyst
   funcs-->visioncapabilities([Utilize Vision])-->vision
   end

   subgraph Combine
   internet & scrape & findocs & visioncapabilities --> combine([Combine Results])
   combine([Combine Results])-->working[Utilizes various Permutation And Combination Of Tools based on Though/Action/Observation]-->result
   end

   end

Why FAISS is used as vector database for structured section?

FAISS Uses Inverted File Based indexing strategy to index the embeddings which is suitable for datasets ranging from 10MB to around 2GB. For higher memory demanding datasets, graph based indexing , hybrid indexing or disk indexing can be used. For most day-to-day purposes FAISS is a good choice.
Chroma database is used for comparatively larger files with more text corpus (example : pdf of 130 pages). It uses Hierarchical Navigable Samll World algorithm for indexing which is good for knn algorithm while performing similarity search.

Optimizations in the application?

EDA GPT is optimized for maximal parallel processing. It embeds a huge list of documents and adds them to chroma parallelly.
It is heavily optimized for searching internet, documents and creating analysis reports from structured and unstructured data.
Advanced retrieval techniques like multiquery retrieval, emsemble retrieval combined with similarity search with a high threshold is used to get useful documents.
A large language model with high context window like gemini-pro-1.5 works best for large volumes of data. Since llms have a limit for context, it is not recommended to feed humungous amount of data in one go. We recommend to divide a huge pdf into smaller pdfs if possible and process independent data in one session. For example a pdf of 1000 pages with over 5 * 10^6 words should be divided for efficiency.
data is cached at every point for faster inference.

Example Of Structured Data Analysis with EDA GPT:

link to notebook: https://colab.research.google.com/drive/1vqMTPWeSlF7iYG06PFkrYw9lxcnrrmaE?usp=sharing#scrollTo=9dzFcTeY53eG

For Indepth Understanding Of The Application Check Out Check out the Low Level Design documentation as markdown and High Level Design pdf

How to start the app

To use this app, follow these steps:

Clone the repository:

git clone https://github.com/shaunthecomputerscientist/EDA-GPT.git
cd EDA-GPT

Make a virtual environment and install dependencies:
```
   pip install -r requirements.txt
```
Set Up secrets.toml inside .streamlit folder:

You can refer to all the documentations for creating api keys for all services.
Start the app:
```
   streamlit run Home.py
```

Docker Support:

Prerequisites

Before you begin, ensure you have the following installed on your local system:

Docker (Make sure Docker Desktop is running if you're on Windows or macOS)

How to Use the App

Step 1: Pull the Docker Image

To get started, pull the Docker image from Docker Hub. Open your terminal and run:

docker pull mrpoldockeroperator123/eda-gpt:v2

Step 2: Run the Docker Container

docker run -d -p 8501:8501 mrpoldockeroperator123/eda-gpt:v2

This command will:

Run the container in detached mode (-d). Map port 8501 on your local machine to port 8501 on the container.

Step 3: Access the application

After the container is running, you can access the EDA-GPT application by navigating to http://localhost:8501 in your web browser.

Step 4 : Stop the Container

stop the container when done

docker ps

This command will list all running containers. Find the CONTAINER ID of the EDA-GPT container and stop it using:

docker stop <CONTAINER_ID>

step 5 : Remove the container and image

If you no longer need the container, you can remove it with:

docker rm <CONTAINER_ID>

If you want to free up space, you can also remove the Docker image from your local system:

current version ----> v2

docker rmi mrpoldockeroperator123/eda-gpt:v2

Troubleshooting If you encounter issues while running the container, consider the following steps:

Check Docker Installation: Ensure Docker is installed and running correctly. Port Availability: Make sure port 8501 is not being used by another application. Logs: Check container logs to diagnose issues by running:

docker logs <CONTAINER_ID>

What is <CONTAINER_ID>?

When you run the command:

docker ps
#you get 
CONTAINER ID   IMAGE                               COMMAND                  CREATED        STATUS        PORTS                    NAMES
e9f8c9b5b86c   mrpoldockeroperator123/eda-gpt:v2   "streamlit run home.py"  10 minutes ago Up 10 minutes 0.0.0.0:8501->8501/tcp   charming_mendel

The CONTAINER ID is the e9f8c9b5b86c in this case

mrpoldockeroperator123/eda-gpt:v1 is the name of the Docker image.
0.0.0.0:8501->8501/tcp indicates that port 8501 on the host is forwarded to port - - 8501 in the container.
charming_mendel is the name automatically assigned to the container by Docker (you can also specify a name using the --name flag when you run the container).

Feedback and Support

We value your feedback and are constantly working to improve EDA GPT. If you encounter any issues or have suggestions for improvement, please don't hesitate to reach out to our support team. developer contact : [email protected]

For Tasks:

Click tags to check more tools for each tasks

analyze structured data generate graphs ask analysis questions compare performance clean and edit data

For Jobs:

data analyst data scientist business analyst research analyst machine learning engineer

Alternative AI tools for EDA-GPT

Similar Open Source Tools

EDA-GPT

github

: 160

Controllable-RAG-Agent

This repository contains a sophisticated deterministic graph-based solution for answering complex questions using a controllable autonomous agent. The solution is designed to ensure that answers are solely based on the provided data, avoiding hallucinations. It involves various steps such as PDF loading, text preprocessing, summarization, database creation, encoding, and utilizing large language models. The algorithm follows a detailed workflow involving planning, retrieval, answering, replanning, content distillation, and performance evaluation. Heuristics and techniques implemented focus on content encoding, anonymizing questions, task breakdown, content distillation, chain of thought answering, verification, and model performance evaluation.

github

: 951

zenml

ZenML is an extensible, open-source MLOps framework for creating portable, production-ready machine learning pipelines. By decoupling infrastructure from code, ZenML enables developers across your organization to collaborate more effectively as they develop to production.

github

: 4.5k

ciso-assistant-community

CISO Assistant is a tool that helps organizations manage their cybersecurity posture and compliance. It provides a centralized platform for managing security controls, threats, and risks. CISO Assistant also includes a library of pre-built frameworks and tools to help organizations quickly and easily implement best practices.

github

: 2.8k

gpt-researcher

GPT Researcher is an autonomous agent designed for comprehensive online research on a variety of tasks. It can produce detailed, factual, and unbiased research reports with customization options. The tool addresses issues of speed, determinism, and reliability by leveraging parallelized agent work. The main idea involves running 'planner' and 'execution' agents to generate research questions, seek related information, and create research reports. GPT Researcher optimizes costs and completes tasks in around 3 minutes. Features include generating long research reports, aggregating web sources, an easy-to-use web interface, scraping web sources, and exporting reports to various formats.

github

: 20.7k

ai-data-analysis-MulitAgent

AI-Driven Research Assistant is an advanced AI-powered system utilizing specialized agents for data analysis, visualization, and report generation. It integrates LangChain, OpenAI's GPT models, and LangGraph for complex research processes. Key features include hypothesis generation, data processing, web search, code generation, and report writing. The system's unique Note Taker agent maintains project state, reducing overhead and improving context retention. System requirements include Python 3.10+ and Jupyter Notebook environment. Installation involves cloning the repository, setting up a Conda virtual environment, installing dependencies, and configuring environment variables. Usage instructions include setting data, running Jupyter Notebook, customizing research tasks, and viewing results. Main components include agents for hypothesis generation, process supervision, visualization, code writing, search, report writing, quality review, and note-taking. Workflow involves hypothesis generation, processing, quality review, and revision. Customization is possible by modifying agent creation and workflow definition. Current issues include OpenAI errors, NoteTaker efficiency, runtime optimization, and refiner improvement. Contributions via pull requests are welcome under the MIT License.

github

: 575

magpie

This is the official repository for 'Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing'. Magpie is a tool designed to synthesize high-quality instruction data at scale by extracting it directly from an aligned Large Language Models (LLMs). It aims to democratize AI by generating large-scale alignment data and enhancing the transparency of model alignment processes. Magpie has been tested on various model families and can be used to fine-tune models for improved performance on alignment benchmarks such as AlpacaEval, ArenaHard, and WildBench.

github

: 168

qdrant

Qdrant is a vector similarity search engine and vector database. It is written in Rust, which makes it fast and reliable even under high load. Qdrant can be used for a variety of applications, including: * Semantic search * Image search * Product recommendations * Chatbots * Anomaly detection Qdrant offers a variety of features, including: * Payload storage and filtering * Hybrid search with sparse vectors * Vector quantization and on-disk storage * Distributed deployment * Highlighted features such as query planning, payload indexes, SIMD hardware acceleration, async I/O, and write-ahead logging Qdrant is available as a fully managed cloud service or as an open-source software that can be deployed on-premises.

github

: 22.9k

fuse-med-ml

FuseMedML is a Python framework designed to accelerate machine learning-based discovery in the medical field by promoting code reuse. It provides a flexible design concept where data is stored in a nested dictionary, allowing easy handling of multi-modality information. The framework includes components for creating custom models, loss functions, metrics, and data processing operators. Additionally, FuseMedML offers 'batteries included' key components such as fuse.data for data processing, fuse.eval for model evaluation, and fuse.dl for reusable deep learning components. It supports PyTorch and PyTorch Lightning libraries and encourages the creation of domain extensions for specific medical domains.

github

: 138

MME-RealWorld

MME-RealWorld is a benchmark designed to address real-world applications with practical relevance, featuring 13,366 high-resolution images and 29,429 annotations across 43 tasks. It aims to provide substantial recognition challenges and overcome common barriers in existing Multimodal Large Language Model benchmarks, such as small data scale, restricted data quality, and insufficient task difficulty. The dataset offers advantages in data scale, data quality, task difficulty, and real-world utility compared to existing benchmarks. It also includes a Chinese version with additional images and QA pairs focused on Chinese scenarios.

github

: 94

HuixiangDou2

HuixiangDou2 is a robustly optimized GraphRAG approach that integrates multiple open-source projects to improve performance in graph-based augmented generation. It conducts comparative experiments and achieves a significant score increase, leading to a GraphRAG implementation with recognized performance. The repository provides code improvements, dense retrieval for querying entities and relationships, real domain knowledge testing, and impact analysis on accuracy.

github

: 78

llmblueprint

LLM Blueprint is an official implementation of a paper that enables text-to-image generation with complex and detailed prompts. It leverages Large Language Models (LLMs) to extract critical components from text prompts, including bounding box coordinates for foreground objects, detailed textual descriptions for individual objects, and a succinct background context. The tool operates in two phases: Global Scene Generation creates an initial scene using object layouts and background context, and an Iterative Refinement Scheme refines box-level content to align with textual descriptions, ensuring consistency and improving recall compared to baseline diffusion models.

github

: 53

crewAI

CrewAI is a cutting-edge framework designed to orchestrate role-playing autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks. It enables AI agents to assume roles, share goals, and operate in a cohesive unit, much like a well-oiled crew. Whether you're building a smart assistant platform, an automated customer service ensemble, or a multi-agent research team, CrewAI provides the backbone for sophisticated multi-agent interactions. With features like role-based agent design, autonomous inter-agent delegation, flexible task management, and support for various LLMs, CrewAI offers a dynamic and adaptable solution for both development and production workflows.

github

: 29.5k

ktransformers

KTransformers is a flexible Python-centric framework designed to enhance the user's experience with advanced kernel optimizations and placement/parallelism strategies for Transformers. It provides a Transformers-compatible interface, RESTful APIs compliant with OpenAI and Ollama, and a simplified ChatGPT-like web UI. The framework aims to serve as a platform for experimenting with innovative LLM inference optimizations, focusing on local deployments constrained by limited resources and supporting heterogeneous computing opportunities like GPU/CPU offloading of quantized models.

github

: 13.3k

radicalbit-ai-monitoring

The Radicalbit AI Monitoring Platform provides a comprehensive solution for monitoring Machine Learning and Large Language models in production. It helps proactively identify and address potential performance issues by analyzing data quality, model quality, and model drift. The repository contains files and projects for running the platform, including UI, API, SDK, and Spark components. Installation using Docker compose is provided, allowing deployment with a K3s cluster and interaction with a k9s container. The platform documentation includes a step-by-step guide for installation and creating dashboards. Community engagement is encouraged through a Discord server. The roadmap includes adding functionalities for batch and real-time workloads, covering various model types and tasks.

github

: 71

gradient-cli

Gradient CLI is a tool designed to facilitate the end-to-end MLOps process, allowing individuals and organizations to develop, train, and deploy Deep Learning models efficiently. It supports various ML/DL frameworks and provides features such as 1-click Jupyter Notebooks, scalable model training workflows, and model deployment as API endpoints. The tool can run on different infrastructures like AWS, GCP, on-premise, and Paperspace GPUs, offering automatic versioning, distributed training, hyperparameter search, and more.

github

: 65

For similar tasks

phoenix

Phoenix is a tool that provides MLOps and LLMOps insights at lightning speed with zero-config observability. It offers a notebook-first experience for monitoring models and LLM Applications by providing LLM Traces, LLM Evals, Embedding Analysis, RAG Analysis, and Structured Data Analysis. Users can trace through the execution of LLM Applications, evaluate generative models, explore embedding point-clouds, visualize generative application's search and retrieval process, and statistically analyze structured data. Phoenix is designed to help users troubleshoot problems related to retrieval, tool execution, relevance, toxicity, drift, and performance degradation.

github

: 5.3k

EDA-GPT

github

: 160

repromodel

ReproModel is an open-source toolbox designed to boost AI research efficiency by enabling researchers to reproduce, compare, train, and test AI models faster. It provides standardized models, dataloaders, and processing procedures, allowing researchers to focus on new datasets and model development. With a no-code solution, users can access benchmark and SOTA models and datasets, utilize training visualizations, extract code for publication, and leverage an LLM-powered automated methodology description writer. The toolbox helps researchers modularize development, compare pipeline performance reproducibly, and reduce time for model development, computation, and writing. Future versions aim to facilitate building upon state-of-the-art research by loading previously published study IDs with verified code, experiments, and results stored in the system.

github

: 151

grps_trtllm

The grps-trtllm repository is a C++ implementation of a high-performance OpenAI LLM service, combining GRPS and TensorRT-LLM. It supports functionalities like Chat, Ai-agent, and Multi-modal. The repository offers advantages over triton-trtllm, including a complete LLM service implemented in pure C++, integrated tokenizer supporting huggingface and sentencepiece, custom HTTP functionality for OpenAI interface, support for different LLM prompt styles and result parsing styles, integration with tensorrt backend and opencv library for multi-modal LLM, and stable performance improvement compared to triton-trtllm.

github

: 122

ReasonFlux

ReasonFlux is a revolutionary template-augmented reasoning paradigm that empowers a 32B model to outperform other models in reasoning tasks. The repository provides official resources for the paper 'ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates', including the latest released model ReasonFlux-F1-32B. It includes updates, dataset links, model zoo, getting started guide, training instructions, evaluation details, inference examples, performance comparisons, reasoning examples, preliminary work references, and citation information.

github

: 367

For similar jobs

Azure-Analytics-and-AI-Engagement

The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

github

: 136

skyvern

Skyvern automates browser-based workflows using LLMs and computer vision. It provides a simple API endpoint to fully automate manual workflows, replacing brittle or unreliable automation solutions. Traditional approaches to browser automations required writing custom scripts for websites, often relying on DOM parsing and XPath-based interactions which would break whenever the website layouts changed. Instead of only relying on code-defined XPath interactions, Skyvern adds computer vision and LLMs to the mix to parse items in the viewport in real-time, create a plan for interaction and interact with them. This approach gives us a few advantages: 1. Skyvern can operate on websites it’s never seen before, as it’s able to map visual elements to actions necessary to complete a workflow, without any customized code 2. Skyvern is resistant to website layout changes, as there are no pre-determined XPaths or other selectors our system is looking for while trying to navigate 3. Skyvern leverages LLMs to reason through interactions to ensure we can cover complex situations. Examples include: 1. If you wanted to get an auto insurance quote from Geico, the answer to a common question “Were you eligible to drive at 18?” could be inferred from the driver receiving their license at age 16 2. If you were doing competitor analysis, it’s understanding that an Arnold Palmer 22 oz can at 7/11 is almost definitely the same product as a 23 oz can at Gopuff (even though the sizes are slightly different, which could be a rounding error!) Want to see examples of Skyvern in action? Jump to #real-world-examples-of- skyvern

github

: 12.9k

pandas-ai

PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.

github

: 14.0k

vanna

Vanna is an open-source Python framework for SQL generation and related functionality. It uses Retrieval-Augmented Generation (RAG) to train a model on your data, which can then be used to ask questions and get back SQL queries. Vanna is designed to be portable across different LLMs and vector databases, and it supports any SQL database. It is also secure and private, as your database contents are never sent to the LLM or the vector database.

github

: 10.8k

databend

Databend is an open-source cloud data warehouse that serves as a cost-effective alternative to Snowflake. With its focus on fast query execution and data ingestion, it's designed for complex analysis of the world's largest datasets.

github

: 7.7k

Avalonia-Assistant

Avalonia-Assistant is an open-source desktop intelligent assistant that aims to provide a user-friendly interactive experience based on the Avalonia UI framework and the integration of Semantic Kernel with OpenAI or other large LLM models. By utilizing Avalonia-Assistant, you can perform various desktop operations through text or voice commands, enhancing your productivity and daily office experience.

github

: 113

marvin

Marvin is a lightweight AI toolkit for building natural language interfaces that are reliable, scalable, and easy to trust. Each of Marvin's tools is simple and self-documenting, using AI to solve common but complex challenges like entity extraction, classification, and generating synthetic data. Each tool is independent and incrementally adoptable, so you can use them on their own or in combination with any other library. Marvin is also multi-modal, supporting both image and audio generation as well using images as inputs for extraction and classification. Marvin is for developers who care more about _using_ AI than _building_ AI, and we are focused on creating an exceptional developer experience. Marvin users should feel empowered to bring tightly-scoped "AI magic" into any traditional software project with just a few extra lines of code. Marvin aims to merge the best practices for building dependable, observable software with the best practices for building with generative AI into a single, easy-to-use library. It's a serious tool, but we hope you have fun with it. Marvin is open-source, free to use, and made with 💙 by the team at Prefect.

github

: 5.5k

activepieces

Activepieces is an open source replacement for Zapier, designed to be extensible through a type-safe pieces framework written in Typescript. It features a user-friendly Workflow Builder with support for Branches, Loops, and Drag and Drop. Activepieces integrates with Google Sheets, OpenAI, Discord, and RSS, along with 80+ other integrations. The list of supported integrations continues to grow rapidly, thanks to valuable contributions from the community. Activepieces is an open ecosystem; all piece source code is available in the repository, and they are versioned and published directly to npmjs.com upon contributions. If you cannot find a specific piece on the pieces roadmap, please submit a request by visiting the following link: Request Piece Alternatively, if you are a developer, you can quickly build your own piece using our TypeScript framework. For guidance, please refer to the following guide: Contributor's Guide

github

: 12.6k

EDA-GPT

README:

EDA GPT: Your OpenSource Data Analysis Companion

Introduction

DEMO VIDEO : https://genny.lovo.ai/share/d6b58f0d-fc46-4aa7-a65e-fa0f9a684f01

Getting Started

How to Use the App

Key Features:

Why FAISS is used as vector database for structured section?

Optimizations in the application?

Example Of Structured Data Analysis with EDA GPT:

For Indepth Understanding Of The Application Check Out Check out the Low Level Design documentation as markdown and High Level Design pdf

How to start the app

You can refer to all the documentations for creating api keys for all services.

Docker Support:

Prerequisites

How to Use the App

Step 1: Pull the Docker Image

Step 2: Run the Docker Container

Step 3: Access the application

Step 4 : Stop the Container

step 5 : Remove the container and image

current version ----> v2

What is <CONTAINER_ID>?

Feedback and Support

For Tasks:

For Jobs:

Alternative AI tools for EDA-GPT

Similar Open Source Tools

EDA-GPT

Controllable-RAG-Agent

zenml

ciso-assistant-community

gpt-researcher

ai-data-analysis-MulitAgent

magpie

qdrant

fuse-med-ml

MME-RealWorld

HuixiangDou2

llmblueprint

crewAI

ktransformers

radicalbit-ai-monitoring

gradient-cli

For similar tasks

phoenix

EDA-GPT

repromodel

grps_trtllm

ReasonFlux

For similar jobs

Azure-Analytics-and-AI-Engagement

skyvern

pandas-ai

vanna

databend

Avalonia-Assistant

marvin

activepieces