Best AI tools for< Clean Datasets >
20 - AI tool Sites
dataset.macgence
dataset.macgence is an AI-powered data analysis tool that helps users extract valuable insights from their datasets. It offers a user-friendly interface for uploading, cleaning, and analyzing data, making it suitable for both beginners and experienced data analysts. With advanced algorithms and visualization capabilities, dataset.macgence enables users to uncover patterns, trends, and correlations in their data, leading to informed decision-making. Whether you're a business professional, researcher, or student, dataset.macgence can streamline your data analysis process and enhance your data-driven strategies.
ChartFast
ChartFast is an AI Data Analyzer tool that automates data visualization and analysis tasks, powered by GPT-4 technology. It allows users to generate precise and sleek graphs in seconds, process vast amounts of data, and provide interactive data queries and quick exports. With features like specialized internal libraries for complex graph generation, customizable visualization code, and instant data export, ChartFast aims to streamline data work and enhance data analysis efficiency.
Audo Studio
Audo Studio is an AI-powered audio cleaning tool that automatically removes background noise, enhances speech, and improves audio quality using advanced audio processing and artificial intelligence technology. With just one click, users can clean their audio in seconds, saving time and effort. The tool is designed to cater to podcasters, YouTubers, video creators, and anyone looking to improve the sound quality of their recordings.
Audioscribe
Audioscribe is an AI-powered Record-to-Text tool developed by Wordware. It allows users to easily convert spoken words into well-structured notes. The tool is designed to help individuals clean up their thoughts by recording and transforming them into organized text. Audioscribe is part of Wordware's suite of applications that aim to streamline various tasks through AI technology, catering to both technical and non-technical users.
Botmake.io
Botmake.io is a simple and clean no-code chatbot creation tool that allows users to create chatbots without any coding experience. With Botmake.io, users can automate repetitive questions, import and export data in CSV format, customize the look and feel of their chatbots, extend their chatbots with apps, and embed their chatbots on their websites. Botmake.io offers a free plan and a premium plan with additional features.
Futr Energy
Futr Energy is a solar asset management platform designed to help manage solar power plants efficiently. It offers a range of tools and features such as remote monitoring, CMMS, inventory management, performance monitoring, and automated reports. Futr Energy aims to provide clean energy developers, operators, and investors with intelligent solutions to optimize the generation and performance of solar assets.
Charm
Charm is an AI-powered spreadsheet assistant that helps users clean messy data, create content, summarize feedback, classify sales leads, and generate dummy data. It is a Google Sheets add-on that automates tasks that are impossible to do with traditional formulas. Charm is used by hundreds of analysts, marketers, product managers, and more.
Earth AI
Earth AI is a high-performance explorer for clean energy minerals, utilizing artificial intelligence to discover untapped critical metal deposits at half the cost and in a fraction of the time. The company works with mineral resource companies to improve their odds of success while keeping costs low, offering accurate AI-driven prospect detection, modular hardware, and streamlined operations. Earth AI's revenue model is independent of service profits, and their process is four times faster than traditional methods. The company partners with explorers and development companies to bring discovered deposits into production.
Luminal
Luminal is a powerful AI copilot that helps users clean, transform, and analyze spreadsheets 10x faster. It offers fast and efficient data analysis capabilities, enabling users to perform complex operations and run AI-enabled tasks using natural language. With Luminal, users can visualize data, ask complex questions, and clean and format spreadsheets effortlessly. The application supports multiple languages, provides secure data hosting with encryption, and offers simple pricing that scales with user needs.
Codacy
Codacy is an AI-powered code quality and security platform designed for developers to efficiently optimize and secure their code. It offers a unified set of AppSec tools, data-driven insights, and seamless integrations across the software development lifecycle. Codacy helps teams monitor and resolve security issues at scale, improve code quality, and prevent breaking changes. With AI suggested fixes and effortless code quality monitoring, Codacy is a valuable tool for businesses and developers alike.
AnyToSpeech
AnyToSpeech is an AI text-to-speech and PDF to Audiobook solution that offers a clean and simple way to convert text, PDFs, documents, scans, and images to speech. It provides a variety of realistic voices in multiple languages for users to choose from. The platform also allows users to convert URLs to speech and offers a library to save and access their generated audio files at any time.
Object Remover
Object Remover is an online image cleanup tool that uses AI to remove unwanted objects, people, and defects from your photos. It's easy to use, just upload your photo and select the objects you want to remove. Object Remover will then automatically process your photo and remove the selected objects, leaving you with a clean, professional-looking image.
Potis
Potis is an AI-powered hiring copilot that automates the screening process and evaluates candidates' real-world skills through behavioral interviews. It provides clear and bias-free talent scoring, customized feedback, and helps recruiters save time and costs while improving the quality of hires.
Firecrawl
Firecrawl is an advanced web crawling and data conversion tool designed to transform any website into clean, LLM-ready markdown. It automates the collection, cleaning, and formatting of web data, streamlining the preparation process for Large Language Model (LLM) applications. Firecrawl is best suited for business websites, documentation, and help centers, offering features like crawling all accessible subpages, handling dynamic content, converting data into well-formatted markdown, and more. It is built by LLM engineers for LLM engineers, providing clean data the way users want it.
Bifrost
Bifrost is an AI-powered tool that converts Figma designs into clean React code automatically. It eliminates the need to write frontend code from scratch, enabling users to create component sets, scale designs, and iterate effortlessly. The tool streamlines the development process, allowing engineers to focus on business-driving features and empowering designers to update screens seamlessly. Bifrost is revolutionizing the design-to-code process with its AI capabilities, making it a valuable asset for design and engineering teams.
OneAudio
OneAudio is an AI-powered tool that allows users to summarize, transcribe, and convert audio files into notes effortlessly. With the ability to recognize words accurately and efficiently, OneAudio helps users organize their ideas in one place. The tool leverages the OpenAI GPT-4 and GPT-4o models to provide users with features like recording audio, saving notes, rewriting summaries using AI, and more. Users can trust the community's positive feedback and enjoy a seamless experience with OneAudio.
Quest
Quest is a web-based application that allows users to generate React code from their designs. It incorporates AI models to generate real, useful code that incorporates all the things professional developers care about. Users can use Quest to build new applications, add to existing applications, and create design systems and libraries. Quest is made for development teams and integrates with the design and dev tools that users love. It is also built for the most demanding product teams and can be used to build new applications, build web pages, and create component templates.
Spark Mail
Spark Mail is a smart and focused email application that utilizes AI technology to help users craft perfect emails quickly. It offers features such as Smart Inbox, Gatekeeper, Snooze Emails, Send Later, Reminder to Follow-up, Email Signatures, Newsletters & Notifications, and more. Spark Mail is designed to filter out the noise in emails, allowing users to prioritize important contacts, organize their inbox, and focus on what's important. With over 17.5 million users worldwide, Spark Mail aims to redefine the way people work by providing tools to overcome information overload and distractions.
RambleFix
RambleFix is an AI note-taking and writing tool that helps users transcribe, clean up, and rewrite their spoken thoughts into articles, notes, emails, social posts, lists, and journal entries. It supports multiple languages and offers features like transcription, restyling with AI, easy sharing, editing, uploading files, mimicking writing style, appending to existing content, and translations. RambleFix is trusted by over 6,000 happy users and is praised for its productivity-boosting capabilities.
PolitePost.net
PolitePost.net is an AI tool that specializes in rewriting emails to make them more professional and suitable for the workplace. Users can utilize the AI-powered chatbot to enhance the language and tone of their emails quickly and efficiently. The tool aims to help individuals improve their communication skills and make a positive impression in their professional correspondence.
20 - Open Source AI Tools
autolabel
Autolabel is a Python library designed to label, clean, and enrich text datasets using Large Language Models (LLMs). It provides a simple 3-step process for labeling data, supports various NLP tasks, and offers features like confidence estimation, explanations, and state management. Users can access Refuel hosted LLMs for labeling and confidence estimation, and the library supports commercial and open source LLMs from providers like OpenAI, Anthropic, HuggingFace, and Google. Autolabel aims to streamline the labeling process for machine learning tasks by leveraging state-of-the-art LLM techniques and minimizing costs and experimentation time.
llm-datasets
LLM Datasets is a repository containing high-quality datasets, tools, and concepts for LLM fine-tuning. It provides datasets with characteristics like accuracy, diversity, and complexity to train large language models for various tasks. The repository includes datasets for general-purpose, math & logic, code, conversation & role-play, and agent & function calling domains. It also offers guidance on creating high-quality datasets through data deduplication, data quality assessment, data exploration, and data generation techniques.
ai-audio-datasets
AI Audio Datasets List (AI-ADL) is a comprehensive collection of datasets consisting of speech, music, and sound effects, used for Generative AI, AIGC, AI model training, and audio applications. It includes datasets for speech recognition, speech synthesis, music information retrieval, music generation, audio processing, sound synthesis, and more. The repository provides a curated list of diverse datasets suitable for various AI audio tasks.
OAD
OAD is a powerful open-source tool for analyzing and visualizing data. It provides a user-friendly interface for exploring datasets, generating insights, and creating interactive visualizations. With OAD, users can easily import data from various sources, clean and preprocess data, perform statistical analysis, and create customizable visualizations to communicate findings effectively. Whether you are a data scientist, analyst, or researcher, OAD can help you streamline your data analysis workflow and uncover valuable insights from your data.
fiftyone
FiftyOne is an open-source tool designed for building high-quality datasets and computer vision models. It supercharges machine learning workflows by enabling users to visualize datasets, interpret models faster, and improve efficiency. With FiftyOne, users can explore scenarios, identify failure modes, visualize complex labels, evaluate models, find annotation mistakes, and much more. The tool aims to streamline the process of improving machine learning models by providing a comprehensive set of features for data analysis and model interpretation.
cleanlab
Cleanlab helps you **clean** data and **lab** els by automatically detecting issues in a ML dataset. To facilitate **machine learning with messy, real-world data** , this data-centric AI package uses your _existing_ models to estimate dataset problems that can be fixed to train even _better_ models.
olah
Olah is a self-hosted lightweight Huggingface mirror service that implements mirroring feature for Huggingface resources at file block level, enhancing download speeds and saving bandwidth. It offers cache control policies and allows administrators to configure accessible repositories. Users can install Olah with pip or from source, set up the mirror site, and download models and datasets using huggingface-cli. Olah provides additional configurations through a configuration file for basic setup and accessibility restrictions. Future work includes implementing an administrator and user system, OOS backend support, and mirror update schedule task. Olah is released under the MIT License.
awesome-mobile-robotics
The 'awesome-mobile-robotics' repository is a curated list of important content related to Mobile Robotics and AI. It includes resources such as courses, books, datasets, software and libraries, podcasts, conferences, journals, companies and jobs, laboratories and research groups, and miscellaneous resources. The repository covers a wide range of topics in the field of Mobile Robotics and AI, providing valuable information for enthusiasts, researchers, and professionals in the domain.
matchem-llm
A public repository collecting links to state-of-the-art training sets, QA, benchmarks and other evaluations for various ML and LLM applications in materials science and chemistry. It includes datasets related to chemistry, materials, multimodal data, and knowledge graphs in the field. The repository aims to provide resources for training and evaluating machine learning models in the materials science and chemistry domains.
LLM-LieDetector
This repository contains code for reproducing experiments on lie detection in black-box LLMs by asking unrelated questions. It includes Q/A datasets, prompts, and fine-tuning datasets for generating lies with language models. The lie detectors rely on asking binary 'elicitation questions' to diagnose whether the model has lied. The code covers generating lies from language models, training and testing lie detectors, and generalization experiments. It requires access to GPUs and OpenAI API calls for running experiments with open-source models. Results are stored in the repository for reproducibility.
J.A.R.V.I.S
J.A.R.V.I.S. is an offline large language model fine-tuned on custom and open datasets to mimic Jarvis's dialog with Stark. It prioritizes privacy by running locally and excels in responding like Jarvis with a similar tone. Current features include time/date queries, web searches, playing YouTube videos, and webcam image descriptions. Users can interact with Jarvis via command line after installing the model locally using Ollama. Future plans involve voice cloning, voice-to-text input, and deploying the voice model as an API.
RobustVLM
This repository contains code for the paper 'Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models'. It focuses on fine-tuning CLIP in an unsupervised manner to enhance its robustness against visual adversarial attacks. By replacing the vision encoder of large vision-language models with the fine-tuned CLIP models, it achieves state-of-the-art adversarial robustness on various vision-language tasks. The repository provides adversarially fine-tuned ViT-L/14 CLIP models and offers insights into zero-shot classification settings and clean accuracy improvements.
MaskLLM
MaskLLM is a learnable pruning method that establishes Semi-structured Sparsity in Large Language Models (LLMs) to reduce computational overhead during inference. It is scalable and benefits from larger training datasets. The tool provides examples for running MaskLLM with Megatron-LM, preparing LLaMA checkpoints, pre-tokenizing C4 data for Megatron, generating prior masks, training MaskLLM, and evaluating the model. It also includes instructions for exporting sparse models to Huggingface.
llm.c
LLM training in simple, pure C/CUDA. There is no need for 245MB of PyTorch or 107MB of cPython. For example, training GPT-2 (CPU, fp32) is ~1,000 lines of clean code in a single file. It compiles and runs instantly, and exactly matches the PyTorch reference implementation. I chose GPT-2 as the first working example because it is the grand-daddy of LLMs, the first time the modern stack was put together.
LLM4Decompile
LLM4Decompile is an open-source large language model dedicated to decompilation of Linux x86_64 binaries, supporting GCC's O0 to O3 optimization levels. It focuses on assessing re-executability of decompiled code through HumanEval-Decompile benchmark. The tool includes models with sizes ranging from 1.3 billion to 33 billion parameters, available on Hugging Face. Users can preprocess C code into binary and assembly instructions, then decompile assembly instructions into C using LLM4Decompile. Ongoing efforts aim to expand capabilities to support more architectures and configurations, integrate with decompilation tools like Ghidra and Rizin, and enhance performance with larger training datasets.
ollama-ebook-summary
The 'ollama-ebook-summary' repository is a Python project that creates bulleted notes summaries of books and long texts, particularly in epub and pdf formats with ToC metadata. It automates the extraction of chapters, splits them into ~2000 token chunks, and allows for asking arbitrary questions to parts of the text for improved granularity of response. The tool aims to provide summaries for each page of a book rather than a one-page summary of the entire document, enhancing content curation and knowledge sharing capabilities.
EDA-GPT
EDA GPT is an open-source data analysis companion that offers a comprehensive solution for structured and unstructured data analysis. It streamlines the data analysis process, empowering users to explore, visualize, and gain insights from their data. EDA GPT supports analyzing structured data in various formats like CSV, XLSX, and SQLite, generating graphs, and conducting in-depth analysis of unstructured data such as PDFs and images. It provides a user-friendly interface, powerful features, and capabilities like comparing performance with other tools, analyzing large language models, multimodal search, data cleaning, and editing. The tool is optimized for maximal parallel processing, searching internet and documents, and creating analysis reports from structured and unstructured data.
rlhf_trojan_competition
This competition is organized by Javier Rando and Florian Tramèr from the ETH AI Center and SPY Lab at ETH Zurich. The goal of the competition is to create a method that can detect universal backdoors in aligned language models. A universal backdoor is a secret suffix that, when appended to any prompt, enables the model to answer harmful instructions. The competition provides a set of poisoned generation models, a reward model that measures how safe a completion is, and a dataset with prompts to run experiments. Participants are encouraged to use novel methods for red-teaming, automated approaches with low human oversight, and interpretability tools to find the trojans. The best submissions will be offered the chance to present their work at an event during the SaTML 2024 conference and may be invited to co-author a publication summarizing the competition results.
contracts
AXONE Smart Contracts repository hosts Smart Contracts for the AXONE network, compatible with any Cosmos blockchains using the CosmWasm framework. It includes storage, sovereignty, and resource management oriented Smart Contracts. Each contract has different functionalities and maturity stages, with detailed tech documentation and emojis indicating maturity levels. The repository provides tools for building, testing, deploying, and interacting with Smart Contracts, along with guidelines for contributing and community engagement.
20 - OpenAI Gpts
DataQualityGuardian
A GPT-powered assistant specializing in data validation and quality checks for various datasets.
Clean My Room
I help declutter your space by analyzing room photos and suggesting what to organize.
Clean Water Solutions
Provides tailored water conservation strategies for diverse regions.β
πΏ Clean Beauty Swaps Assistant π·
Find eco-friendly beauty alternatives! ππ This GPT helps you swap to clean, sustainable products with ease.
π± Clean Energy Companion π
Your eco-friendly aide for sustainable living! π Offers insights on renewable energy sources, tips for reducing carbon footprint, and green tech trends. π
CleanGPT ADHD Cleaning Helper
making you have a fun time and be accountable for a clean space
Squeaky Data Cleaner
Clean and structure your raw data with automatic file output for your Custom GPT knowledge.
Robert on Software Craftsmanship
Ask Robert SΓΆsemann, a Salesforce MVP and inventor of PMD for Salesforce, about Salesforce Development, Clean Code and PMD
π₯ Paleo Buddy Tracker π₯
Your go-to π AI assistant for tracking Paleo diet meals π, offering recipes π, and managing dietary goals π―. Eat clean, live strong!
Screenshot To Code GPT
Upload a screenshot of a website and convert it to clean HTML/Tailwind/JS code.
Cleaning Genius
π AI-Powered Eco-Friendly Stain Solver π Your smart stain-removing companion for any surface. Say goodbye to tough stains with Clean Genius! π±β¨
Extra Green Cleaning Service
We deliver a greener, safer clean to your home and your family with our environmentally friendly products.
Python Assistant
A Python and programming expert, guiding users on best practices for writing clean, efficient, and well-documented Python code.