![ai-data-science-team](/statics/github-mark.png)
ai-data-science-team
An AI Data Science Team
Stars: 618
![screenshot](/screenshots_githubs/business-science-ai-data-science-team.jpg)
The AI Data Science Team of Copilots is an AI-powered data science team that uses agents to help users perform common data science tasks 10X faster. It includes agents specializing in data cleaning, preparation, feature engineering, modeling, and interpretation of business problems. The project is a work in progress with new data science agents to be released soon. Disclaimer: This project is for educational purposes only and not intended to replace a company's data science team. No warranties or guarantees are provided, and the creator assumes no liability for financial loss.
README:
An AI-powered data science team of agents to help you perform common data science tasks 10X faster.
Please ⭐ us on GitHub (it takes 2 seconds and means a lot).
Beta - This Python library is under active development. There may be breaking changes that occur until release of 0.1.0.
The AI Data Science Team of Copilots includes Agents that specialize data cleaning, preparation, feature engineering, modeling (machine learning), and interpretation of various business problems like:
- Churn Modeling
- Employee Attrition
- Lead Scoring
- Insurance Risk
- Credit Card Risk
- And more
- Your AI Data Science Team (An Army Of Agents)
- Want To Become A Full-Stack Generative AI Data Scientist?
Want to have your own customized enterprise-grade AI Data Science Team and domain-specific AI-powered Apps?
Send inquiries here: https://www.business-science.io/contact.html
If you're an aspiring data scientist who wants to learn how to build AI Agents and AI Apps for your company that performs Data Science, Business Intelligence, Churn Modeling, Time Series Forecasting, and more, then I'd love to help you.
Register for my next Generative AI for Data Scientists workshop here.
This project is a work in progress. New data science agents will be released soon.
This is the internals of the SQL Data Analyst Agent that connects to SQL databases to pull data into the data science environment. It creates pipelines to automate data extraction, performs Joins, Aggregations, and other SQL Query operations. And it includes a Data Visualization Agent that creates visualizations to help you understand your data.:
This is a top secret project I'm working on. It's a multi-agent data science app that performs time series forecasting.
- Data Wrangling Agent: Merges, Joins, Preps and Wrangles data into a format that is ready for data analysis.
- Data Visualization Agent: Creates visualizations to help you understand your data. Returns JSON serializable plotly visualizations.
- Data Cleaning Agent: Performs Data Preparation steps including handling missing values, outliers, and data type conversions.
- Feature Engineering Agent: Converts the prepared data into ML-ready data. Adds features to increase predictive accuracy of ML models.
- SQL Database Agent: Connects to SQL databases to pull data into the data science environment. Creates pipelines to automate data extraction. Performs Joins, Aggregations, and other SQL Query operations.
- SQL Data Analyst Agent: Connects to SQL databases to pull data into the data science environment. Creates pipelines to automate data extraction. Performs Joins, Aggregations, and other SQL Query operations. Includes a Data Visualization Agent that creates visualizations to help you understand your data.
- Data Analyst: Analyzes data structure, creates exploratory visualizations, and performs correlation analysis to identify relationships.
- Machine Learning Agent: Builds and logs the machine learning models.
- Interpretability Agent: Performs Interpretable ML to explain why the model returned predictions including which features were the most important to the model.
- Supervisor: Forms task list. Moderates sub-agents. Returns completed assignment.
This project is for educational purposes only.
- It is not intended to replace your company's data science team
- No warranties or guarantees provided
- Creator assumes no liability for financial loss
- Consult an experienced Generative AI Data Scientist for building your own custom AI Data Science Team
- If you want a custom enterprise-grade AI Data Science Team, send inquiries here.
By using this software, you agree to use it solely for learning purposes.
pip install git+https://github.com/business-science/ai-data-science-team.git --upgrade
feature_engineering_agent = FeatureEngineeringAgent(model = llm)
feature_engineering_agent.invoke_agent(
data_raw = df,
user_instructions = "Make sure to scale and center numeric features",
target_variable = "Churn",
max_retries = 3,
)
---FEATURE ENGINEERING AGENT----
* CREATE FEATURE ENGINEER CODE
* EXECUTING AGENT CODE
* EXPLAIN AGENT CODE
feature_engineering_agent.get_data_engineered()
data_cleaning_agent = DataCleaningAgent(model = llm)
response = data_cleaning_agent.invoke_agent(
data_raw = df,
user_instructions = "Don't remove outliers when cleaning the data.",
max_retries = 3,
)
---DATA CLEANING AGENT----
* CREATE DATA CLEANER CODE
* EXECUTING AGENT CODE
* EXPLAIN AGENT CODE
data_cleaning_agent.get_data_cleaned()
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License. See LICENSE file for details.
I teach Generative AI Data Science to help you build AI-powered data science apps. Register for my next Generative AI for Data Scientists workshop here.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for ai-data-science-team
Similar Open Source Tools
![ai-data-science-team Screenshot](/screenshots_githubs/business-science-ai-data-science-team.jpg)
ai-data-science-team
The AI Data Science Team of Copilots is an AI-powered data science team that uses agents to help users perform common data science tasks 10X faster. It includes agents specializing in data cleaning, preparation, feature engineering, modeling, and interpretation of business problems. The project is a work in progress with new data science agents to be released soon. Disclaimer: This project is for educational purposes only and not intended to replace a company's data science team. No warranties or guarantees are provided, and the creator assumes no liability for financial loss.
![rowfill Screenshot](/screenshots_githubs/harishdeivanayagam-rowfill.jpg)
rowfill
Rowfill is an open-source document processing platform designed for knowledge workers. It offers advanced AI capabilities to extract, analyze, and process data from complex documents, images, and PDFs. The platform features advanced OCR and processing functionalities, auto-schema generation, and custom actions for creating tailored workflows. It prioritizes privacy and security by supporting Local LLMs like Llama and Mistral, syncing with company data while maintaining privacy, and being open source with AGPLv3 licensing. Rowfill is a versatile tool that aims to streamline document processing tasks for users in various industries.
![paperless-ai Screenshot](/screenshots_githubs/clusterzx-paperless-ai.jpg)
paperless-ai
Paperless-AI is an automated document analyzer tool designed for Paperless-ngx users. It utilizes the OpenAI API and Ollama (Mistral, llama, phi 3, gemma 2) to automatically scan, analyze, and tag documents. The tool offers features such as automatic document scanning, AI-powered document analysis, automatic title and tag assignment, manual mode for analyzing documents, easy setup through a web interface, document processing dashboard, error handling, and Docker support. Users can configure the tool through a web interface and access a debug interface for monitoring and troubleshooting. Paperless-AI aims to streamline document organization and analysis processes for users with access to Paperless-ngx and AI capabilities.
![postgresml Screenshot](/screenshots_githubs/postgresml-postgresml.jpg)
postgresml
PostgresML is a powerful Postgres extension that seamlessly combines data storage and machine learning inference within your database. It enables running machine learning and AI operations directly within PostgreSQL, leveraging GPU acceleration for faster computations, integrating state-of-the-art large language models, providing built-in functions for text processing, enabling efficient similarity search, offering diverse ML algorithms, ensuring high performance, scalability, and security, supporting a wide range of NLP tasks, and seamlessly integrating with existing PostgreSQL tools and client libraries.
![CursorLens Screenshot](/screenshots_githubs/HamedMP-CursorLens.jpg)
CursorLens
Cursor Lens is an open-source tool that acts as a proxy between Cursor and various AI providers, logging interactions and providing detailed analytics to help developers optimize their use of AI in their coding workflow. It supports multiple AI providers, captures and logs all requests, provides visual analytics on AI usage, allows users to set up and switch between different AI configurations, offers real-time monitoring of AI interactions, tracks token usage, estimates costs based on token usage and model pricing. Built with Next.js, React, PostgreSQL, Prisma ORM, Vercel AI SDK, Tailwind CSS, and shadcn/ui components.
![Open-Medical-Reasoning-Tasks Screenshot](/screenshots_githubs/openlifescience-ai-Open-Medical-Reasoning-Tasks.jpg)
Open-Medical-Reasoning-Tasks
Open Life Science AI: Medical Reasoning Tasks is a collaborative hub for developing cutting-edge reasoning tasks for Large Language Models (LLMs) in the medical, healthcare, and clinical domains. The repository aims to advance AI capabilities in healthcare by fostering accurate diagnoses, personalized treatments, and improved patient outcomes. It offers a diverse range of medical reasoning challenges such as Diagnostic Reasoning, Treatment Planning, Medical Image Analysis, Clinical Data Interpretation, Patient History Analysis, Ethical Decision Making, Medical Literature Comprehension, and Drug Interaction Assessment. Contributors can join the community of healthcare professionals, AI researchers, and enthusiasts to contribute to the repository by creating new tasks or improvements following the provided guidelines. The repository also provides resources including a task list, evaluation metrics, medical AI papers, and healthcare datasets for training and evaluation.
![ai-driven-dev-community Screenshot](/screenshots_githubs/alexsoyes-ai-driven-dev-community.jpg)
ai-driven-dev-community
AI Driven Dev Community is a repository aimed at helping developers become more efficient by utilizing AI tools in their daily coding tasks. It provides a collection of tools, prompts, snippets, and agents for developers to integrate AI into their workflow. The repository is regularly updated with new resources and focuses on best practices for using AI in development work. Users can find tools like Espanso, ChatGPT, GitHub Copilot, and VSCode recommended for enhancing their coding experience. Additionally, the repository offers guidance on customizing AI for developers, installing AI toolbox for software engineers, and contributing to the community through easy steps.
![ai-chat-android Screenshot](/screenshots_githubs/GetStream-ai-chat-android.jpg)
ai-chat-android
AI Chat Android demonstrates Google's Generative AI on Android with Firebase Realtime Database. It showcases Gemini API integration, Jetpack Compose UI elements, Android architecture components with Hilt, Kotlin Coroutines for background tasks, and Firebase Realtime Database integration for real-time events. The project follows Google's official architecture guidance with a modularized structure for reusability, parallel building, and decentralized focusing.
![UFO Screenshot](/screenshots_githubs/microsoft-UFO.jpg)
UFO
UFO is a UI-focused dual-agent framework to fulfill user requests on Windows OS by seamlessly navigating and operating within individual or spanning multiple applications.
![swark Screenshot](/screenshots_githubs/swark-io-swark.jpg)
swark
Swark is a VS Code extension that automatically generates architecture diagrams from code using large language models (LLMs). It is directly integrated with GitHub Copilot, requires no authentication or API key, and supports all languages. Swark helps users learn new codebases, review AI-generated code, improve documentation, understand legacy code, spot design flaws, and gain test coverage insights. It saves output in a 'swark-output' folder with diagram and log files. Source code is only shared with GitHub Copilot for privacy. The extension settings allow customization for file reading, file extensions, exclusion patterns, and language model selection. Swark is open source under the GNU Affero General Public License v3.0.
![AI-Gateway Screenshot](/screenshots_githubs/Azure-Samples-AI-Gateway.jpg)
AI-Gateway
The AI-Gateway repository explores the AI Gateway pattern through a series of experimental labs, focusing on Azure API Management for handling AI services APIs. The labs provide step-by-step instructions using Jupyter notebooks with Python scripts, Bicep files, and APIM policies. The goal is to accelerate experimentation of advanced use cases and pave the way for further innovation in the rapidly evolving field of AI. The repository also includes a Mock Server to mimic the behavior of the OpenAI API for testing and development purposes.
![superduper Screenshot](/screenshots_githubs/superduper-io-superduper.jpg)
superduper
superduper.io is a Python framework that integrates AI models, APIs, and vector search engines directly with existing databases. It allows hosting of models, streaming inference, and scalable model training/fine-tuning. Key features include integration of AI with data infrastructure, inference via change-data-capture, scalable model training, model chaining, simple Python interface, Python-first approach, working with difficult data types, feature storing, and vector search capabilities. The tool enables users to turn their existing databases into centralized repositories for managing AI model inputs and outputs, as well as conducting vector searches without the need for specialized databases.
![awesome-llm-apps Screenshot](/screenshots_githubs/Shubhamsaboo-awesome-llm-apps.jpg)
awesome-llm-apps
Awesome LLM Apps is a curated collection of applications that leverage RAG with OpenAI, Anthropic, Gemini, and open-source models. The repository contains projects such as Local Llama-3 with RAG for chatting with webpages locally, Chat with Gmail for interacting with Gmail using natural language, Chat with Substack Newsletter for conversing with Substack newsletters using GPT-4, Chat with PDF for intelligent conversation based on PDF documents, and Chat with YouTube Videos for engaging with YouTube video content through natural language. Users can clone the repository, navigate to specific project directories, install dependencies, and follow project-specific instructions to set up and run the apps. Contributions are encouraged, and new app ideas or improvements can be submitted via pull requests.
![GenerativeAIExamples Screenshot](/screenshots_githubs/NVIDIA-GenerativeAIExamples.jpg)
GenerativeAIExamples
NVIDIA Generative AI Examples are state-of-the-art examples that are easy to deploy, test, and extend. All examples run on the high performance NVIDIA CUDA-X software stack and NVIDIA GPUs. These examples showcase the capabilities of NVIDIA's Generative AI platform, which includes tools, frameworks, and models for building and deploying generative AI applications.
![TaskingAI Screenshot](/screenshots_githubs/TaskingAI-TaskingAI.jpg)
TaskingAI
TaskingAI brings Firebase's simplicity to **AI-native app development**. The platform enables the creation of GPTs-like multi-tenant applications using a wide range of LLMs from various providers. It features distinct, modular functions such as Inference, Retrieval, Assistant, and Tool, seamlessly integrated to enhance the development process. TaskingAI’s cohesive design ensures an efficient, intelligent, and user-friendly experience in AI application development.
![intro-llm-rag Screenshot](/screenshots_githubs/zahaby-intro-llm-rag.jpg)
intro-llm-rag
This repository serves as a comprehensive guide for technical teams interested in developing conversational AI solutions using Retrieval-Augmented Generation (RAG) techniques. It covers theoretical knowledge and practical code implementations, making it suitable for individuals with a basic technical background. The content includes information on large language models (LLMs), transformers, prompt engineering, embeddings, vector stores, and various other key concepts related to conversational AI. The repository also provides hands-on examples for two different use cases, along with implementation details and performance analysis.
For similar tasks
![ai-data-science-team Screenshot](/screenshots_githubs/business-science-ai-data-science-team.jpg)
ai-data-science-team
The AI Data Science Team of Copilots is an AI-powered data science team that uses agents to help users perform common data science tasks 10X faster. It includes agents specializing in data cleaning, preparation, feature engineering, modeling, and interpretation of business problems. The project is a work in progress with new data science agents to be released soon. Disclaimer: This project is for educational purposes only and not intended to replace a company's data science team. No warranties or guarantees are provided, and the creator assumes no liability for financial loss.
![sorrentum Screenshot](/screenshots_githubs/sorrentum-sorrentum.jpg)
sorrentum
Sorrentum is an open-source project that aims to combine open-source development, startups, and brilliant students to build machine learning, AI, and Web3 / DeFi protocols geared towards finance and economics. The project provides opportunities for internships, research assistantships, and development grants, as well as the chance to work on cutting-edge problems, learn about startups, write academic papers, and get internships and full-time positions at companies working on Sorrentum applications.
![djl Screenshot](/screenshots_githubs/deepjavalibrary-djl.jpg)
djl
Deep Java Library (DJL) is an open-source, high-level, engine-agnostic Java framework for deep learning. It is designed to be easy to get started with and simple to use for Java developers. DJL provides a native Java development experience and allows users to integrate machine learning and deep learning models with their Java applications. The framework is deep learning engine agnostic, enabling users to switch engines at any point for optimal performance. DJL's ergonomic API interface guides users with best practices to accomplish deep learning tasks, such as running inference and training neural networks.
![craftgen Screenshot](/screenshots_githubs/craftgen-craftgen.jpg)
craftgen
Craftgen.ai is an innovative AI platform designed for both technical and non-technical users. It's built on a foundation of graph architecture for scalability and the Actor Model for efficient concurrent operations, tailored to both technical and non-technical users. A key aspect of Craftgen.ai is its modular AI approach, allowing users to assemble and customize AI components like building blocks to fit their specific needs. The platform's robustness is enhanced by its event-driven architecture, ensuring reliable data processing and featuring browser web technologies for universal access. Craftgen.ai excels in dynamic tool and workflow generation, with strong offline capabilities for secure environments and plans for desktop application integration. A unique and valuable feature of Craftgen.ai is its marketplace, where users can access a variety of pre-built AI solutions. This marketplace accelerates the deployment of AI tools but also fosters a community of sharing and innovation. Users can contribute to and leverage this repository of solutions, enhancing the platform's versatility and practicality. Craftgen.ai uses JSON schema for industry-standard alignment, enabling seamless integration with any API following the OpenAPI spec. This allows for a broad range of applications, from automating data analysis to streamlining content management. The platform is designed to bridge the gap between advanced AI technology and practical usability. It's a flexible, secure, and intuitive platform that empowers users, from developers seeking to create custom AI solutions to businesses looking to automate routine tasks. Craftgen.ai's goal is to make AI technology an integral, seamless part of everyday problem-solving and innovation, providing a platform where modular AI and a thriving marketplace converge to meet the diverse needs of its users.
![Data-Science-EBooks Screenshot](/screenshots_githubs/aniketpotabatti-Data-Science-EBooks.jpg)
Data-Science-EBooks
This repository contains a collection of resources in the form of eBooks related to Data Science, Machine Learning, and similar topics.
![BambooAI Screenshot](/screenshots_githubs/pgalko-BambooAI.jpg)
BambooAI
BambooAI is a lightweight library utilizing Large Language Models (LLMs) to provide natural language interaction capabilities, much like a research and data analysis assistant enabling conversation with your data. You can either provide your own data sets, or allow the library to locate and fetch data for you. It supports Internet searches and external API interactions.
![ai_wiki Screenshot](/screenshots_githubs/charliedream1-ai_wiki.jpg)
ai_wiki
This repository provides a comprehensive collection of resources, open-source tools, and knowledge related to quantitative analysis. It serves as a valuable knowledge base and navigation guide for individuals interested in various aspects of quantitative investing, including platforms, programming languages, mathematical foundations, machine learning, deep learning, and practical applications. The repository is well-structured and organized, with clear sections covering different topics. It includes resources on system platforms, programming codes, mathematical foundations, algorithm principles, machine learning, deep learning, reinforcement learning, graph networks, model deployment, and practical applications. Additionally, there are dedicated sections on quantitative trading and investment, as well as large models. The repository is actively maintained and updated, ensuring that users have access to the latest information and resources.
![free-for-life Screenshot](/screenshots_githubs/wdhdev-free-for-life.jpg)
free-for-life
A massive list including a huge amount of products and services that are completely free! ⭐ Star on GitHub • 🤝 Contribute # Table of Contents * APIs, Data & ML * Artificial Intelligence * BaaS * Code Editors * Code Generation * DNS * Databases * Design & UI * Domains * Email * Font * For Students * Forms * Linux Distributions * Messaging & Streaming * PaaS * Payments & Billing * SSL
For similar jobs
![ai-data-science-team Screenshot](/screenshots_githubs/business-science-ai-data-science-team.jpg)
ai-data-science-team
The AI Data Science Team of Copilots is an AI-powered data science team that uses agents to help users perform common data science tasks 10X faster. It includes agents specializing in data cleaning, preparation, feature engineering, modeling, and interpretation of business problems. The project is a work in progress with new data science agents to be released soon. Disclaimer: This project is for educational purposes only and not intended to replace a company's data science team. No warranties or guarantees are provided, and the creator assumes no liability for financial loss.
![databerry Screenshot](/screenshots_githubs/gmpetrov-databerry.jpg)
databerry
Chaindesk is a no-code platform that allows users to easily set up a semantic search system for personal data without technical knowledge. It supports loading data from various sources such as raw text, web pages, files (Word, Excel, PowerPoint, PDF, Markdown, Plain Text), and upcoming support for web sites, Notion, and Airtable. The platform offers a user-friendly interface for managing datastores, querying data via a secure API endpoint, and auto-generating ChatGPT Plugins for each datastore. Chaindesk utilizes a Vector Database (Qdrant), Openai's text-embedding-ada-002 for embeddings, and has a chunk size of 1024 tokens. The technology stack includes Next.js, Joy UI, LangchainJS, PostgreSQL, Prisma, and Qdrant, inspired by the ChatGPT Retrieval Plugin.
![OAD Screenshot](/screenshots_githubs/zeiss-microscopy-OAD.jpg)
OAD
OAD is a powerful open-source tool for analyzing and visualizing data. It provides a user-friendly interface for exploring datasets, generating insights, and creating interactive visualizations. With OAD, users can easily import data from various sources, clean and preprocess data, perform statistical analysis, and create customizable visualizations to communicate findings effectively. Whether you are a data scientist, analyst, or researcher, OAD can help you streamline your data analysis workflow and uncover valuable insights from your data.
![sqlcoder Screenshot](/screenshots_githubs/defog-ai-sqlcoder.jpg)
sqlcoder
Defog's SQLCoder is a family of state-of-the-art large language models (LLMs) designed for converting natural language questions into SQL queries. It outperforms popular open-source models like gpt-4 and gpt-4-turbo on SQL generation tasks. SQLCoder has been trained on more than 20,000 human-curated questions based on 10 different schemas, and the model weights are licensed under CC BY-SA 4.0. Users can interact with SQLCoder through the 'transformers' library and run queries using the 'sqlcoder launch' command in the terminal. The tool has been tested on NVIDIA GPUs with more than 16GB VRAM and Apple Silicon devices with some limitations. SQLCoder offers a demo on their website and supports quantized versions of the model for consumer GPUs with sufficient memory.
![TableLLM Screenshot](/screenshots_githubs/RUCKBReasoning-TableLLM.jpg)
TableLLM
TableLLM is a large language model designed for efficient tabular data manipulation tasks in real office scenarios. It can generate code solutions or direct text answers for tasks like insert, delete, update, query, merge, and chart operations on tables embedded in spreadsheets or documents. The model has been fine-tuned based on CodeLlama-7B and 13B, offering two scales: TableLLM-7B and TableLLM-13B. Evaluation results show its performance on benchmarks like WikiSQL, Spider, and self-created table operation benchmark. Users can use TableLLM for code and text generation tasks on tabular data.
![mlcraft Screenshot](/screenshots_githubs/mlcraft-io-mlcraft.jpg)
mlcraft
Synmetrix (prev. MLCraft) is an open source data engineering platform and semantic layer for centralized metrics management. It provides a complete framework for modeling, integrating, transforming, aggregating, and distributing metrics data at scale. Key features include data modeling and transformations, semantic layer for unified data model, scheduled reports and alerts, versioning, role-based access control, data exploration, caching, and collaboration on metrics modeling. Synmetrix leverages Cube (Cube.js) for flexible data models that consolidate metrics from various sources, enabling downstream distribution via a SQL API for integration into BI tools, reporting, dashboards, and data science. Use cases include data democratization, business intelligence, embedded analytics, and enhancing accuracy in data handling and queries. The tool speeds up data-driven workflows from metrics definition to consumption by combining data engineering best practices with self-service analytics capabilities.
![data-scientist-roadmap2024 Screenshot](/screenshots_githubs/xandie985-data-scientist-roadmap2024.jpg)
data-scientist-roadmap2024
The Data Scientist Roadmap2024 provides a comprehensive guide to mastering essential tools for data science success. It includes programming languages, machine learning libraries, cloud platforms, and concepts categorized by difficulty. The roadmap covers a wide range of topics from programming languages to machine learning techniques, data visualization tools, and DevOps/MLOps tools. It also includes web development frameworks and specific concepts like supervised and unsupervised learning, NLP, deep learning, reinforcement learning, and statistics. Additionally, it delves into DevOps tools like Airflow and MLFlow, data visualization tools like Tableau and Matplotlib, and other topics such as ETL processes, optimization algorithms, and financial modeling.
![VMind Screenshot](/screenshots_githubs/VisActor-VMind.jpg)
VMind
VMind is an open-source solution for intelligent visualization, providing an intelligent chart component based on LLM by VisActor. It allows users to create chart narrative works with natural language interaction, edit charts through dialogue, and export narratives as videos or GIFs. The tool is easy to use, scalable, supports various chart types, and offers one-click export functionality. Users can customize chart styles, specify themes, and aggregate data using LLM models. VMind aims to enhance efficiency in creating data visualization works through dialogue-based editing and natural language interaction.