Best AI tools for< Automate Data Cleaning >
20 - AI tool Sites

Raijin.ai
Raijin.ai is an AI-powered Customer Discovery and Intelligence Hub designed to help teams aggregate and extract key insights from customer conversations. It accelerates product development by prioritizing features based on customer feedback. The platform offers features like AI Thematic Analysis, Report Writing, Segmentation, and Tags to streamline qualitative research and analysis processes. Raijin.ai is ideal for user researchers, product analysts, and teams looking to integrate AI seamlessly into their workflow to create customer-centric products and data-driven marketing strategies.

ChartFast
ChartFast is an AI Data Analyzer tool that automates data visualization and analysis tasks, powered by GPT-4 technology. It allows users to generate precise and sleek graphs in seconds, process vast amounts of data, and provide interactive data queries and quick exports. With features like specialized internal libraries for complex graph generation, customizable visualization code, and instant data export, ChartFast aims to streamline data work and enhance data analysis efficiency.

nuvo
nuvo is an AI-powered data import solution that offers fast, secure, and scalable data import solutions for software companies. It provides tools like nuvo Data Importer SDK and nuvo Data Pipeline to streamline manual and recurring ETL data imports, enabling users to manage data imports independently. With AI-enhanced automation, nuvo helps prepare clean data for preferred systems quickly and efficiently, reducing manual effort and improving data quality. The platform allows users to upload unlimited data in various formats, match imported data to system schemas, clean and validate data, and import clean data into target systems with just a click.

Lume AI
Lume AI is an AI-powered data mapping suite that automates the process of mapping, cleaning, and validating data in various workflows. It offers a comprehensive solution for building pipelines, onboarding customer data, and more. With AI-driven insights, users can streamline data analysis, mapper generation, deployment, and maintenance. Lume AI provides both a no-code platform and API integration options for seamless data mapping. Trusted by market leaders and startups, Lume AI ensures data security with enterprise-grade encryption and compliance standards.

SheetBot AI
SheetBot AI is an AI data analyst tool that enables users to analyze data quickly without the need for coding. It automates repetitive and time-consuming data tasks, making data visualization and analysis more efficient. With SheetBot AI, users can generate accurate and visually appealing graphs in seconds, streamlining the data analysis process.

Kanaries
Kanaries is an augmented analytics platform that uses AI to automate the process of data exploration and visualization. It offers a variety of features to help users quickly and easily find insights in their data, including: * **RATH:** An AI-powered engine that can automatically generate insights and recommendations based on your data. * **Graphic Walker:** A visual analytics tool that allows you to explore your data in a variety of ways, including charts, graphs, and maps. * **Data Painter:** A data cleaning and transformation tool that makes it easy to prepare your data for analysis. * **Causal Analysis:** A tool that helps you identify and understand the causal relationships between variables in your data. Kanaries is designed to be easy to use, even for users with no prior experience with data analysis. It is also highly scalable, so it can be used to analyze large datasets. Kanaries is a valuable tool for anyone who wants to quickly and easily find insights in their data. It can be used by businesses of all sizes, and it is particularly well-suited for organizations that are looking to improve their data-driven decision-making.

Clay
Clay is an AI-powered data enrichment and outreach automation tool designed to help go-to-market teams scale personalized outbound campaigns. It combines 75+ data enrichment tools, AI capabilities, and automation features to streamline lead generation, data cleaning, and personalized messaging. With access to 50+ data providers, Clay offers comprehensive coverage of information and enables users to connect, enrich, and sync their CRM data effortlessly. The platform also features AI web scraping, personalized email building, automated inbound and outbound processes, and data formatting functionalities.

Mito
Mito is a low-code data app infrastructure that allows users to edit spreadsheets and automatically generate Python code. It is designed to help analysts automate their repetitive Excel work and take automation into their own hands. Mito is a Jupyter extension and Streamlit component, so users don't need to set up any new infrastructure. It is easy to get started with Mito, simply install it using pip and start using it in Jupyter or Streamlit.

maya.ai
Crayon Data's maya.ai platform is an AI-led revenue acceleration platform for enterprises. It helps businesses unlock the value of data to increase customer engagement and revenue. The platform offers a range of capabilities, including data cleaning and enrichment, personalized recommendations, and plug-and-play APIs. Maya.ai has been used by leading global enterprises to achieve significant results, including increased revenue, improved customer engagement, and reduced time to market.

Sourcetable
Sourcetable is an AI-powered spreadsheet and data analysis tool that enables users to perform various tasks such as analyzing files, creating visualizations, writing formulas, researching, and cleaning data with the help of artificial intelligence. It offers features like AI Spreadsheet Assistant, AI Formula Generator, AI Chart Generator, AI Data Analysis, SQL Generator, and more to streamline data-related tasks efficiently.

Double
Double is an AI tool designed to help users find and convert leads with hyper-targeted messages. It automates the process of cleaning, enriching, and qualifying leads using AI technology. By leveraging GPT, Double can research leads on the internet and provide answers to questions, saving users time and effort in manual research tasks. The platform is ideal for sales and marketing teams looking to streamline their lead generation process and improve conversion rates through personalized messaging.

Seudo
Seudo is a data workflow automation platform that uses AI to help businesses automate their data processes. It provides a variety of features to help businesses with data integration, data cleansing, data transformation, and data analysis. Seudo is designed to be easy to use, even for businesses with no prior experience with AI. It offers a drag-and-drop interface that makes it easy to create and manage data workflows. Seudo also provides a variety of pre-built templates that can be used to get started quickly.

Displayr
Displayr is a comprehensive data workspace designed for teams, offering a range of capabilities including survey analysis, data visualization, dashboarding, automatic updating, PowerPoint reporting, finding data stories, and data cleaning. The platform aims to streamline workflow efficiency, promote self-sufficiency through DIY analytics, enable data storytelling with compelling narratives, and ensure quality control to minimize errors. Displayr caters to statisticians, market researchers, report creators, and professionals working with data, providing a user-friendly interface for creating interactive and insightful data stories.
![Collective[i] Screenshot](/screenshots/collectivei.com.jpg)
Collective[i]
Collective[i] is an AI-powered platform that helps businesses optimize their sales processes through data-driven insights and automation. The platform leverages deep learning to automate tasks, improve decision-making, and enhance the overall buying experience. By utilizing AI technology, Collective[i] enables organizations to forecast sales, improve productivity, expand margins, and grow revenue. The platform offers applications such as Intelligent WriteBack™ for data cleansing and automation, C[i]™ for Sales for buyer-centric selling, and Intelligence.com® for supporting a community of business connectors. Collective[i] prioritizes enterprise-level security and privacy to ensure the confidentiality and integrity of client data.

Firecrawl
Firecrawl is an advanced web crawling and data conversion tool designed to transform any website into clean, LLM-ready markdown. It automates the collection, cleaning, and formatting of web data, streamlining the preparation process for Large Language Model (LLM) applications. Firecrawl is best suited for business websites, documentation, and help centers, offering features like crawling all accessible subpages, handling dynamic content, converting data into well-formatted markdown, and more. It is built by LLM engineers for LLM engineers, providing clean data the way users want it.

Superjoin
Superjoin is an AI-powered tool that allows users to automatically pull data from various tools into Google Sheets without the need for writing any code. It offers features like one-click connectors, auto-refresh schedules, data preview, and the ability to send report screenshots to Slack and Email. Superjoin is loved by thousands of users across hundreds of companies for its efficiency in automating workflows and data management.

Webscrape AI
Webscrape AI is a no-code web scraping tool that allows users to collect data from websites without writing any code. It is easy to use, accurate, and affordable, making it a great option for businesses of all sizes. With Webscrape AI, you can automate your data collection process and free up your time to focus on other tasks.

Extracta.ai
Extracta.ai is an AI data extraction tool for documents and images that automates data extraction processes with easy integration. It allows users to define custom templates for extracting structured data without the need for training. The platform can extract data from various document types, including invoices, resumes, contracts, receipts, and more, providing accurate and efficient results. Extracta.ai ensures data security, encryption, and GDPR compliance, making it a reliable solution for businesses looking to streamline document processing.

Airparser
Airparser is an AI-powered email and document parser tool that revolutionizes data extraction by utilizing the GPT parser engine. It allows users to automate the extraction of structured data from various sources such as emails, PDFs, documents, and handwritten texts. With features like automatic extraction, export to multiple platforms, and support for multiple languages, Airparser simplifies data extraction processes for individuals and businesses. The tool ensures data security and offers seamless integration with other applications through APIs and webhooks.

Eigen Technologies
Eigen Technologies is an AI-powered data extraction platform designed for business users to automate the extraction of data from various documents. The platform offers solutions for intelligent document processing and automation, enabling users to streamline business processes, make informed decisions, and achieve significant efficiency gains. Eigen's platform is purpose-built to deliver real ROI by reducing manual processes, improving data accuracy, and accelerating decision-making across industries such as corporates, banks, financial services, insurance, law, and manufacturing. With features like generative insights, table extraction, pre-processing hub, and model governance, Eigen empowers users to automate data extraction workflows efficiently. The platform is known for its unmatched accuracy, speed, and capability, providing customers with a flexible and scalable solution that integrates seamlessly with existing systems.
20 - Open Source AI Tools

ai-data-science-team
The AI Data Science Team of Copilots is an AI-powered data science team that uses agents to help users perform common data science tasks 10X faster. It includes agents specializing in data cleaning, preparation, feature engineering, modeling, and interpretation of business problems. The project is a work in progress with new data science agents to be released soon. Disclaimer: This project is for educational purposes only and not intended to replace a company's data science team. No warranties or guarantees are provided, and the creator assumes no liability for financial loss.

ProX
ProX is a lm-based data refinement framework that automates the process of cleaning and improving data used in pre-training large language models. It offers better performance, domain flexibility, efficiency, and cost-effectiveness compared to traditional methods. The framework has been shown to improve model performance by over 2% and boost accuracy by up to 20% in tasks like math. ProX is designed to refine data at scale without the need for manual adjustments, making it a valuable tool for data preprocessing in natural language processing tasks.

cellm
Cellm is an Excel extension that allows users to leverage Large Language Models (LLMs) like ChatGPT within cell formulas. It enables users to extract AI responses to text ranges, making it useful for automating repetitive tasks that involve data processing and analysis. Cellm supports various models from Anthropic, Mistral, OpenAI, and Google, as well as locally hosted models via Llamafiles, Ollama, or vLLM. The tool is designed to simplify the integration of AI capabilities into Excel for tasks such as text classification, data cleaning, content summarization, entity extraction, and more.

Fueling-Ambitions-Via-Book-Discoveries
Fueling-Ambitions-Via-Book-Discoveries is an Advanced Machine Learning & AI Course designed for students, professionals, and AI researchers. The course integrates rigorous theoretical foundations with practical coding exercises, ensuring learners develop a deep understanding of AI algorithms and their applications in finance, healthcare, robotics, NLP, cybersecurity, and more. Inspired by MIT, Stanford, and Harvard’s AI programs, it combines academic research rigor with industry-standard practices used by AI engineers at companies like Google, OpenAI, Facebook AI, DeepMind, and Tesla. Learners can learn 50+ AI techniques from top Machine Learning & Deep Learning books, code from scratch with real-world datasets, projects, and case studies, and focus on ML Engineering & AI Deployment using Django & Streamlit. The course also offers industry-relevant projects to build a strong AI portfolio.

dotnet-ai-workshop
The .NET AI Workshop is a comprehensive guide designed to help developers add AI features to .NET applications. It covers various AI-based features such as classification, summarization, data extraction/cleaning, anomaly detection, translation, sentiment detection, semantic search, Q&A chatbots, and voice assistants. The workshop is tailored for developers new to AI in .NET applications, focusing on the usage of AI services without the need for prior AI technology knowledge. It provides examples using .NET and C# with a focus on AI topics, aiming to enhance productivity and automation in applications.

free-for-life
A massive list including a huge amount of products and services that are completely free! ⭐ Star on GitHub • 🤝 Contribute # Table of Contents * APIs, Data & ML * Artificial Intelligence * BaaS * Code Editors * Code Generation * DNS * Databases * Design & UI * Domains * Email * Font * For Students * Forms * Linux Distributions * Messaging & Streaming * PaaS * Payments & Billing * SSL

obsei
Obsei is an open-source, low-code, AI powered automation tool that consists of an Observer to collect unstructured data from various sources, an Analyzer to analyze the collected data with various AI tasks, and an Informer to send analyzed data to various destinations. The tool is suitable for scheduled jobs or serverless applications as all Observers can store their state in databases. Obsei is still in alpha stage, so caution is advised when using it in production. The tool can be used for social listening, alerting/notification, automatic customer issue creation, extraction of deeper insights from feedbacks, market research, dataset creation for various AI tasks, and more based on creativity.

ai-starter-kit
SambaNova AI Starter Kits is a collection of open-source examples and guides designed to facilitate the deployment of AI-driven use cases for developers and enterprises. The kits cover various categories such as Data Ingestion & Preparation, Model Development & Optimization, Intelligent Information Retrieval, and Advanced AI Capabilities. Users can obtain a free API key using SambaNova Cloud or deploy models using SambaStudio. Most examples are written in Python but can be applied to any programming language. The kits provide resources for tasks like text extraction, fine-tuning embeddings, prompt engineering, question-answering, image search, post-call analysis, and more.

llm-twin-course
The LLM Twin Course is a free, end-to-end framework for building production-ready LLM systems. It teaches you how to design, train, and deploy a production-ready LLM twin of yourself powered by LLMs, vector DBs, and LLMOps good practices. The course is split into 11 hands-on written lessons and the open-source code you can access on GitHub. You can read everything and try out the code at your own pace.

upgini
Upgini is an intelligent data search engine with a Python library that helps users find and add relevant features to their ML pipeline from various public, community, and premium external data sources. It automates the optimization of connected data sources by generating an optimal set of machine learning features using large language models, GraphNNs, and recurrent neural networks. The tool aims to simplify feature search and enrichment for external data to make it a standard approach in machine learning pipelines. It democratizes access to data sources for the data science community.

Streamline-Analyst
Streamline Analyst is a cutting-edge, open-source application powered by Large Language Models (LLMs) designed to revolutionize data analysis. This Data Analysis Agent effortlessly automates tasks such as data cleaning, preprocessing, and complex operations like identifying target objects, partitioning test sets, and selecting the best-fit models based on your data. With Streamline Analyst, results visualization and evaluation become seamless. It aims to expedite the data analysis process, making it accessible to all, regardless of their expertise in data analysis. The tool is built to empower users to process data and achieve high-quality visualizations with unparalleled efficiency, and to execute high-performance modeling with the best strategies. Future enhancements include Natural Language Processing (NLP), neural networks, and object detection utilizing YOLO, broadening its capabilities to meet diverse data analysis needs.

EDA-GPT
EDA GPT is an open-source data analysis companion that offers a comprehensive solution for structured and unstructured data analysis. It streamlines the data analysis process, empowering users to explore, visualize, and gain insights from their data. EDA GPT supports analyzing structured data in various formats like CSV, XLSX, and SQLite, generating graphs, and conducting in-depth analysis of unstructured data such as PDFs and images. It provides a user-friendly interface, powerful features, and capabilities like comparing performance with other tools, analyzing large language models, multimodal search, data cleaning, and editing. The tool is optimized for maximal parallel processing, searching internet and documents, and creating analysis reports from structured and unstructured data.

LLM4DB
LLM4DB is a repository focused on the intersection of Large Language Models (LLM) and Database technologies. It covers various aspects such as data processing, data analysis, database optimization, and data management for LLM. The repository includes works on data cleaning, entity matching, schema matching, data discovery, NL2SQL, data exploration, data visualization, configuration tuning, query optimization, and anomaly diagnosis using LLMs. It aims to provide insights and advancements in leveraging LLMs for improving data processing, analysis, and database management tasks.

LLM4DB
LLM4DB is a repository focused on the intersection of Large Language Models (LLMs) and Database technologies. It covers various aspects such as data processing, data analysis, database optimization, and data management for LLMs. The repository includes research papers, tools, and techniques related to leveraging LLMs for tasks like data cleaning, entity matching, schema matching, data discovery, NL2SQL, data exploration, data visualization, knob tuning, query optimization, and database diagnosis.

hongbomiao.com
hongbomiao.com is a personal research and development (R&D) lab that facilitates the sharing of knowledge. The repository covers a wide range of topics including web development, mobile development, desktop applications, API servers, cloud native technologies, data processing, machine learning, computer vision, embedded systems, simulation, database management, data cleaning, data orchestration, testing, ops, authentication, authorization, security, system tools, reverse engineering, Ethereum, hardware, network, guidelines, design, bots, and more. It provides detailed information on various tools, frameworks, libraries, and platforms used in these domains.

Auto-Analyst
Auto-Analyst is an AI-driven data analytics agentic system designed to simplify and enhance the data science process. By integrating various specialized AI agents, this tool aims to make complex data analysis tasks more accessible and efficient for data analysts and scientists. Auto-Analyst provides a streamlined approach to data preprocessing, statistical analysis, machine learning, and visualization, all within an interactive Streamlit interface. It offers plug and play Streamlit UI, agents with data science speciality, complete automation, LLM agnostic operation, and is built using lightweight frameworks.
20 - OpenAI Gpts

DataKitchen DataOps and Data Observability GPT
A specialist in DataOps and Data Observability, aiding in data management and monitoring.

Self Builder
I automate GPT creation, saving + 99% time and securing data, preventing someone steal your idea.

Power Automate Tutor
Learn at your own pace and empower your organization with self-service automation.

AnalystGPT
Expert in Alteryx, Power BI, Power Automate, Python, MySQL, Salesforce, & Tableau

Data Analysis and Operations Research Expert
Expert in ML, operations research, Treasure Data, Mac M2

Data Analytics Specialist
Leading Big Data Analytics tool, blending advanced technology with OpenAI's expertise.

Data Strategy Sage
Market-leading datafication strategist, excelling in analysis and problem-solving, powered by OpenAI.

AutoChatGPT
Have a large task to accomplish? AutoChatGPT will continually review and give itself new instructions to complete a task using expert agents.