Best AI tools for< Cleaning Data >
20 - AI tool Sites

Sourcetable
Sourcetable is an AI-powered spreadsheet and data analysis tool that enables users to perform various tasks such as analyzing files, creating visualizations, writing formulas, researching, and cleaning data with the help of artificial intelligence. It offers features like AI Spreadsheet Assistant, AI Formula Generator, AI Chart Generator, AI Data Analysis, SQL Generator, and more to streamline data-related tasks efficiently.

Array Assistant
Array Assistant is an AI-driven tool designed to supercharge spreadsheet productivity. It offers a wide range of features such as creating formulas, cleaning data, summarizing text, explaining problems, and designing automations. Whether you are a professional, student, or spreadsheet user, Array Assistant can help you enhance your workflow and save time. With a user-friendly interface and innovative AI technology, Array Assistant revolutionizes the way you work with spreadsheets.

SheetBot AI
SheetBot AI is an AI data analyst tool that enables users to analyze data quickly without the need for coding. It automates repetitive and time-consuming data tasks, making data visualization and analysis more efficient. With SheetBot AI, users can generate accurate and visually appealing graphs in seconds, streamlining the data analysis process.

ChartFast
ChartFast is an AI Data Analyzer tool that automates data visualization and analysis tasks, powered by GPT-4 technology. It allows users to generate precise and sleek graphs in seconds, process vast amounts of data, and provide interactive data queries and quick exports. With features like specialized internal libraries for complex graph generation, customizable visualization code, and instant data export, ChartFast aims to streamline data work and enhance data analysis efficiency.

DataCamp
DataCamp is an online learning platform that offers courses in data science, AI, and machine learning. The platform provides interactive exercises, short videos, and hands-on projects to help learners develop the skills they need to succeed in the field. DataCamp also offers a variety of resources for businesses, including team training, custom content development, and data science consulting.

nuvo
nuvo is an AI-powered data import solution that offers fast, secure, and scalable data import solutions for software companies. It provides tools like nuvo Data Importer SDK and nuvo Data Pipeline to streamline manual and recurring ETL data imports, enabling users to manage data imports independently. With AI-enhanced automation, nuvo helps prepare clean data for preferred systems quickly and efficiently, reducing manual effort and improving data quality. The platform allows users to upload unlimited data in various formats, match imported data to system schemas, clean and validate data, and import clean data into target systems with just a click.

Displayr
Displayr is a comprehensive data workspace designed for teams, offering a range of capabilities including survey analysis, data visualization, dashboarding, automatic updating, PowerPoint reporting, finding data stories, and data cleaning. The platform aims to streamline workflow efficiency, promote self-sufficiency through DIY analytics, enable data storytelling with compelling narratives, and ensure quality control to minimize errors. Displayr caters to statisticians, market researchers, report creators, and professionals working with data, providing a user-friendly interface for creating interactive and insightful data stories.

Clay
Clay is an AI-powered data enrichment and outreach automation tool designed to help go-to-market teams scale personalized outbound campaigns. It combines 75+ data enrichment tools, AI capabilities, and automation features to streamline lead generation, data cleaning, and personalized messaging. With access to 50+ data providers, Clay offers comprehensive coverage of information and enables users to connect, enrich, and sync their CRM data effortlessly. The platform also features AI web scraping, personalized email building, automated inbound and outbound processes, and data formatting functionalities.

Mito
Mito is a low-code data app infrastructure that allows users to edit spreadsheets and automatically generate Python code. It is designed to help analysts automate their repetitive Excel work and take automation into their own hands. Mito is a Jupyter extension and Streamlit component, so users don't need to set up any new infrastructure. It is easy to get started with Mito, simply install it using pip and start using it in Jupyter or Streamlit.

RTutor
RTutor is an AI tool developed by Orditus LLC that leverages OpenAI's large language models to translate natural language into R or Python code for data analysis. Users can upload data in various formats, ask questions, and receive results in seconds. The tool allows for analyzing traditional statistics data, providing comprehensive exploratory data analysis reports, and generating code chunks for data analysis. RTutor is suitable for both academia and industry partnerships, offering demos and seminars via Zoom. It is a free tool for non-profit organizations, with licensing required for commercial use.

ChartPixel
ChartPixel is an AI-assisted data analysis platform that empowers users to effortlessly generate charts, insights, and actionable statistics in just 30 seconds. The platform is designed to demystify data and analysis, making it accessible to users of all skill levels. ChartPixel combines the power of AI with domain expertise to provide secure and reliable output, ensuring trustworthy results without compromising data privacy. With user-friendly features and educational tools, ChartPixel helps users clean, wrangle, visualize, and present data with ease, catering to both beginners and professionals.

maya.ai
Crayon Data's maya.ai platform is an AI-led revenue acceleration platform for enterprises. It helps businesses unlock the value of data to increase customer engagement and revenue. The platform offers a range of capabilities, including data cleaning and enrichment, personalized recommendations, and plug-and-play APIs. Maya.ai has been used by leading global enterprises to achieve significant results, including increased revenue, improved customer engagement, and reduced time to market.

Kanaries
Kanaries is an augmented analytics platform that uses AI to automate the process of data exploration and visualization. It offers a variety of features to help users quickly and easily find insights in their data, including: * **RATH:** An AI-powered engine that can automatically generate insights and recommendations based on your data. * **Graphic Walker:** A visual analytics tool that allows you to explore your data in a variety of ways, including charts, graphs, and maps. * **Data Painter:** A data cleaning and transformation tool that makes it easy to prepare your data for analysis. * **Causal Analysis:** A tool that helps you identify and understand the causal relationships between variables in your data. Kanaries is designed to be easy to use, even for users with no prior experience with data analysis. It is also highly scalable, so it can be used to analyze large datasets. Kanaries is a valuable tool for anyone who wants to quickly and easily find insights in their data. It can be used by businesses of all sizes, and it is particularly well-suited for organizations that are looking to improve their data-driven decision-making.

Raijin.ai
Raijin.ai is an AI-powered Customer Discovery and Intelligence Hub designed to help teams aggregate and extract key insights from customer conversations. It accelerates product development by prioritizing features based on customer feedback. The platform offers features like AI Thematic Analysis, Report Writing, Segmentation, and Tags to streamline qualitative research and analysis processes. Raijin.ai is ideal for user researchers, product analysts, and teams looking to integrate AI seamlessly into their workflow to create customer-centric products and data-driven marketing strategies.

Segmed
Segmed offers a free Medical Data De-Identification Tool that utilizes NLP and language models to remove any PHI, ensuring privacy-compliant medical research. The tool is designed for demonstration purposes only, with the option to reach out for De-Id as a service. Segmed.ai does not save or store any data, providing a secure environment for cleaning medical data. Users can access sample data and benefit from de-identified clinical data solutions.

PaperClip
PaperClip is an AI tool designed to help users keep track of their daily AI papers review. It allows users to memorize details from papers in machine learning, computer vision, and natural language processing. The tool provides an extension that enables users to find back important findings from AI research papers, ML blog posts, and news. PaperClip's AI runs locally, ensuring data privacy by not sending any information to external servers. With features like offline support, data cleaning, and easy reset options, PaperClip offers a convenient solution for organizing and accessing research findings.

Airscale
Airscale is a lead generation tool that helps businesses find, enrich, and export leads from various sources. It offers a range of features including lead scraping, data enrichment, AI-powered content generation, and data cleaning. Airscale integrates with popular CRMs and outbound tools, making it easy for businesses to manage their lead generation process.

Respaid
Respaid is a B2B collections tool that focuses on respectful and efficient debt recovery. It utilizes AI-powered precision messaging to optimize communication with debtors, resulting in a 50% collection rate and 30x faster performance than traditional methods. The tool offers features such as direct payments, database cleaning, and insights into why customers aren't paying. Respaid aims to protect your brand reputation while helping you recover unpaid invoices in a respectful manner.

dataset.macgence
dataset.macgence is an AI-powered data analysis tool that helps users extract valuable insights from their datasets. It offers a user-friendly interface for uploading, cleaning, and analyzing data, making it suitable for both beginners and experienced data analysts. With advanced algorithms and visualization capabilities, dataset.macgence enables users to uncover patterns, trends, and correlations in their data, leading to informed decision-making. Whether you're a business professional, researcher, or student, dataset.macgence can streamline your data analysis process and enhance your data-driven strategies.

Lume AI
Lume AI is an AI-powered data mapping application that automates the process of mapping, cleaning, and validating data in various workflows. It offers an all-in-one suite for building pipelines, onboarding customer data, and providing AI-powered insights for data analysis. Users can choose between a no-code platform and API integration to streamline their data mapping processes. Lume AI ensures data security with enterprise-grade encryption and access controls, eliminating the need for manual data mapping. The application is designed to save time and improve efficiency in data management tasks.
20 - Open Source AI Tools

ai-data-science-team
The AI Data Science Team of Copilots is an AI-powered data science team that uses agents to help users perform common data science tasks 10X faster. It includes agents specializing in data cleaning, preparation, feature engineering, modeling, and interpretation of business problems. The project is a work in progress with new data science agents to be released soon. Disclaimer: This project is for educational purposes only and not intended to replace a company's data science team. No warranties or guarantees are provided, and the creator assumes no liability for financial loss.

LLM4DB
LLM4DB is a repository focused on the intersection of Large Language Models (LLM) and Database technologies. It covers various aspects such as data processing, data analysis, database optimization, and data management for LLM. The repository includes works on data cleaning, entity matching, schema matching, data discovery, NL2SQL, data exploration, data visualization, configuration tuning, query optimization, and anomaly diagnosis using LLMs. It aims to provide insights and advancements in leveraging LLMs for improving data processing, analysis, and database management tasks.

hongbomiao.com
hongbomiao.com is a personal research and development (R&D) lab that facilitates the sharing of knowledge. The repository covers a wide range of topics including web development, mobile development, desktop applications, API servers, cloud native technologies, data processing, machine learning, computer vision, embedded systems, simulation, database management, data cleaning, data orchestration, testing, ops, authentication, authorization, security, system tools, reverse engineering, Ethereum, hardware, network, guidelines, design, bots, and more. It provides detailed information on various tools, frameworks, libraries, and platforms used in these domains.

Daily-DeepLearning
Daily-DeepLearning is a repository that covers various computer science topics such as data structures, operating systems, computer networks, Python programming, data science packages like numpy, pandas, matplotlib, machine learning theories, deep learning theories, NLP concepts, machine learning practical applications, deep learning practical applications, and big data technologies like Hadoop and Hive. It also includes coding exercises related to '剑指offer'. The repository provides detailed explanations and examples for each topic, making it a comprehensive resource for learning and practicing different aspects of computer science and data-related fields.

mlcontests.github.io
ML Contests is a platform that provides a sortable list of public machine learning/data science/AI contests, viewable on mlcontests.com. Users can submit pull requests for any changes or additions to the competitions list by editing the competitions.json file on the GitHub repository. The platform requires mandatory fields such as competition name, URL, type of ML, deadline for submissions, prize information, platform running the competition, and sponsorship details. Optional fields include conference affiliation, conference year, competition launch date, registration deadline, additional URLs, and tags relevant to the challenge type. The platform is transitioning towards assigning multiple tags to competitions for better categorization and searchability.

ProX
ProX is a lm-based data refinement framework that automates the process of cleaning and improving data used in pre-training large language models. It offers better performance, domain flexibility, efficiency, and cost-effectiveness compared to traditional methods. The framework has been shown to improve model performance by over 2% and boost accuracy by up to 20% in tasks like math. ProX is designed to refine data at scale without the need for manual adjustments, making it a valuable tool for data preprocessing in natural language processing tasks.

Streamline-Analyst
Streamline Analyst is a cutting-edge, open-source application powered by Large Language Models (LLMs) designed to revolutionize data analysis. This Data Analysis Agent effortlessly automates tasks such as data cleaning, preprocessing, and complex operations like identifying target objects, partitioning test sets, and selecting the best-fit models based on your data. With Streamline Analyst, results visualization and evaluation become seamless. It aims to expedite the data analysis process, making it accessible to all, regardless of their expertise in data analysis. The tool is built to empower users to process data and achieve high-quality visualizations with unparalleled efficiency, and to execute high-performance modeling with the best strategies. Future enhancements include Natural Language Processing (NLP), neural networks, and object detection utilizing YOLO, broadening its capabilities to meet diverse data analysis needs.

LLMs
LLMs is a Chinese large language model technology stack for practical use. It includes high-availability pre-training, SFT, and DPO preference alignment code framework. The repository covers pre-training data cleaning, high-concurrency framework, SFT dataset cleaning, data quality improvement, and security alignment work for Chinese large language models. It also provides open-source SFT dataset construction, pre-training from scratch, and various tools and frameworks for data cleaning, quality optimization, and task alignment.

LLM4DB
LLM4DB is a repository focused on the intersection of Large Language Models (LLMs) and Database technologies. It covers various aspects such as data processing, data analysis, database optimization, and data management for LLMs. The repository includes research papers, tools, and techniques related to leveraging LLMs for tasks like data cleaning, entity matching, schema matching, data discovery, NL2SQL, data exploration, data visualization, knob tuning, query optimization, and database diagnosis.

sycamore
Sycamore is a conversational search and analytics platform for complex unstructured data, such as documents, presentations, transcripts, embedded tables, and internal knowledge repositories. It retrieves and synthesizes high-quality answers through bringing AI to data preparation, indexing, and retrieval. Sycamore makes it easy to prepare unstructured data for search and analytics, providing a toolkit for data cleaning, information extraction, enrichment, summarization, and generation of vector embeddings that encapsulate the semantics of data. Sycamore uses your choice of generative AI models to make these operations simple and effective, and it enables quick experimentation and iteration. Additionally, Sycamore uses OpenSearch for indexing, enabling hybrid (vector + keyword) search, retrieval-augmented generation (RAG) pipelining, filtering, analytical functions, conversational memory, and other features to improve information retrieval.

LakeSoul
LakeSoul is a cloud-native Lakehouse framework that supports scalable metadata management, ACID transactions, efficient and flexible upsert operation, schema evolution, and unified streaming & batch processing. It supports multiple computing engines like Spark, Flink, Presto, and PyTorch, and computing modes such as batch, stream, MPP, and AI. LakeSoul scales metadata management and achieves ACID control by using PostgreSQL. It provides features like automatic compaction, table lifecycle maintenance, redundant data cleaning, and permission isolation for metadata.

EDA-GPT
EDA GPT is an open-source data analysis companion that offers a comprehensive solution for structured and unstructured data analysis. It streamlines the data analysis process, empowering users to explore, visualize, and gain insights from their data. EDA GPT supports analyzing structured data in various formats like CSV, XLSX, and SQLite, generating graphs, and conducting in-depth analysis of unstructured data such as PDFs and images. It provides a user-friendly interface, powerful features, and capabilities like comparing performance with other tools, analyzing large language models, multimodal search, data cleaning, and editing. The tool is optimized for maximal parallel processing, searching internet and documents, and creating analysis reports from structured and unstructured data.

cellm
Cellm is an Excel extension that allows users to leverage Large Language Models (LLMs) like ChatGPT within cell formulas. It enables users to extract AI responses to text ranges, making it useful for automating repetitive tasks that involve data processing and analysis. Cellm supports various models from Anthropic, Mistral, OpenAI, and Google, as well as locally hosted models via Llamafiles, Ollama, or vLLM. The tool is designed to simplify the integration of AI capabilities into Excel for tasks such as text classification, data cleaning, content summarization, entity extraction, and more.

Auto-Analyst
Auto-Analyst is an AI-driven data analytics agentic system designed to simplify and enhance the data science process. By integrating various specialized AI agents, this tool aims to make complex data analysis tasks more accessible and efficient for data analysts and scientists. Auto-Analyst provides a streamlined approach to data preprocessing, statistical analysis, machine learning, and visualization, all within an interactive Streamlit interface. It offers plug and play Streamlit UI, agents with data science speciality, complete automation, LLM agnostic operation, and is built using lightweight frameworks.

Fueling-Ambitions-Via-Book-Discoveries
Fueling-Ambitions-Via-Book-Discoveries is an Advanced Machine Learning & AI Course designed for students, professionals, and AI researchers. The course integrates rigorous theoretical foundations with practical coding exercises, ensuring learners develop a deep understanding of AI algorithms and their applications in finance, healthcare, robotics, NLP, cybersecurity, and more. Inspired by MIT, Stanford, and Harvard’s AI programs, it combines academic research rigor with industry-standard practices used by AI engineers at companies like Google, OpenAI, Facebook AI, DeepMind, and Tesla. Learners can learn 50+ AI techniques from top Machine Learning & Deep Learning books, code from scratch with real-world datasets, projects, and case studies, and focus on ML Engineering & AI Deployment using Django & Streamlit. The course also offers industry-relevant projects to build a strong AI portfolio.

intro_pharma_ai
This repository serves as an educational resource for pharmaceutical and chemistry students to learn the basics of Deep Learning through a collection of Jupyter Notebooks. The content covers various topics such as Introduction to Jupyter, Python, Cheminformatics & RDKit, Linear Regression, Data Science, Linear Algebra, Neural Networks, PyTorch, Convolutional Neural Networks, Transfer Learning, Recurrent Neural Networks, Autoencoders, Graph Neural Networks, and Summary. The notebooks aim to provide theoretical concepts to understand neural networks through code completion, but instructors are encouraged to supplement with their own lectures. The work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

YuLan-Mini
YuLan-Mini is a lightweight language model with 2.4 billion parameters that achieves performance comparable to industry-leading models despite being pre-trained on only 1.08T tokens. It excels in mathematics and code domains. The repository provides pre-training resources, including data pipeline, optimization methods, and annealing approaches. Users can pre-train their own language models, perform learning rate annealing, fine-tune the model, research training dynamics, and synthesize data. The team behind YuLan-Mini is AI Box at Renmin University of China. The code is released under the MIT License with future updates on model weights usage policies. Users are advised on potential safety concerns and ethical use of the model.

Awesome-LLM-Tabular
This repository is a curated list of research papers that explore the integration of Large Language Model (LLM) technology with tabular data. It aims to provide a comprehensive resource for researchers and practitioners interested in this emerging field. The repository includes papers on a wide range of topics, including table-to-text generation, table question answering, and tabular data classification. It also includes a section on related datasets and resources.

ShortcutsBench
ShortcutsBench is a project focused on collecting and analyzing workflows created in the Shortcuts app, providing a dataset of shortcut metadata, source files, and API information. It aims to study the integration of large language models with Apple devices, particularly focusing on the role of shortcuts in enhancing user experience. The project offers insights for Shortcuts users, enthusiasts, and researchers to explore, customize workflows, and study automated workflows, low-code programming, and API-based agents.
20 - OpenAI Gpts

Squeaky Data Cleaner
Clean and structure your raw data with automatic file output for your Custom GPT knowledge.

Cleaning Genius
👌 AI-Powered Eco-Friendly Stain Solver 👌 Your smart stain-removing companion for any surface. Say goodbye to tough stains with Clean Genius! 🌱✨

CleanGPT ADHD Cleaning Helper
making you have a fun time and be accountable for a clean space

Cleaning Advisor
A virtual assistant for cleaning and organizing, offering personalized advice and schedules.

Extra Green Cleaning Service
We deliver a greener, safer clean to your home and your family with our environmentally friendly products.

CleanBiz Mentor
A mentor for janitorial entrepreneurs offering guidance for scaling cleaning businesses.

HomeSync AI
Your AI home organizer for streamlined cleaning schedules, inventory tracking, and decluttering support, tailored to your household dynamics.

Live Dwell
I teach Home Economics and help with Cooking, Cleaning, and Running a Household.

Carpet Weaver Assistant
Hello I'm Carpet Weaver Assistant! What would you like help with today?

La Suegra Limpiadora
Experta en la eliminación de manchas de ropa, sofás y otros tejidos. Te dejaré la ropa "perfesssstaaa"