Best AI tools for< Clean Datasets >
20 - AI tool Sites
dataset.macgence
dataset.macgence is an AI-powered data analysis tool that helps users extract valuable insights from their datasets. It offers a user-friendly interface for uploading, cleaning, and analyzing data, making it suitable for both beginners and experienced data analysts. With advanced algorithms and visualization capabilities, dataset.macgence enables users to uncover patterns, trends, and correlations in their data, leading to informed decision-making. Whether you're a business professional, researcher, or student, dataset.macgence can streamline your data analysis process and enhance your data-driven strategies.
ChartFast
ChartFast is an AI Data Analyzer tool that automates data visualization and analysis tasks, powered by GPT-4 technology. It allows users to generate precise and sleek graphs in seconds, process vast amounts of data, and provide interactive data queries and quick exports. With features like specialized internal libraries for complex graph generation, customizable visualization code, and instant data export, ChartFast aims to streamline data work and enhance data analysis efficiency.
Audo Studio
Audo Studio is an AI-powered audio cleaning tool that automatically removes background noise, enhances speech, and adjusts volume levels with a single click. It offers advanced noise removal, echo reduction, and fast audio cleaning capabilities. With over 25,000 users and 300,000 audio hours cleaned, Audo Studio is a popular choice for podcasters, YouTubers, and content creators looking to improve sound quality effortlessly.
Audioscribe
Audioscribe is an AI-powered Record-to-Text tool developed by Wordware. It allows users to easily convert spoken words into well-structured notes. The tool is designed to help individuals clean up their thoughts by recording and transforming them into organized text. Audioscribe is part of Wordware's suite of applications that aim to streamline various tasks through AI technology, catering to both technical and non-technical users.
Botmake.io
Botmake.io is a simple and clean no-code chatbot creation tool that allows users to create chatbots without any coding experience. With Botmake.io, users can automate repetitive questions, import and export data in CSV format, customize the look and feel of their chatbots, extend their chatbots with apps, and embed their chatbots on their websites. Botmake.io offers a free plan and a premium plan with additional features.
Futr Energy
Futr Energy is a solar asset management platform that helps manage solar power plants by offering intelligent RMS, CMMS, and asset management tools for clean energy developers, operators, and investors. It provides features such as remote monitoring, inventory management, performance monitoring, automated reports, and drone thermography. Futr Energy aims to bridge the gap between physical and virtual aspects of solar power for smarter and optimized operations.
Charm
Charm is an AI-powered spreadsheet assistant that helps users clean messy data, create content, summarize feedback, classify sales leads, and generate dummy data. It is a Google Sheets add-on that automates tasks that are impossible to do with traditional formulas. Charm is used by hundreds of analysts, marketers, product managers, and more.
Earth AI
Earth AI is a high-performance explorer for clean energy minerals, utilizing artificial intelligence to discover untapped critical metal deposits at half the cost and in a fraction of the time. The company works with mineral resource companies to improve their odds of success while keeping costs low, offering accurate AI-driven prospect detection, modular hardware, and streamlined operations. Earth AI's revenue model is independent of service profits, and their process is four times faster than traditional methods. The company partners with explorers and development companies to bring discovered deposits into production.
Luminal
Luminal is an AI-powered tool designed to clean, transform, and analyze spreadsheets efficiently. It offers users the ability to perform complex data operations, answer sophisticated questions, and run AI-enabled tasks using natural language. With Luminal, users can visualize data, clean and format spreadsheets effortlessly, and benefit from secure data hosting and encryption. The tool is suitable for both professional and personal use, providing a user-friendly experience for data analysis and manipulation.
AnyToSpeech
AnyToSpeech is an AI text-to-speech and PDF to Audiobook solution that offers a clean and simple way to convert text, PDFs, documents, scans, and images to speech. It provides a variety of realistic voices in multiple languages for users to choose from. The platform also allows users to convert URLs to speech and offers a library to save and access their generated audio files at any time.
Object Remover
Object Remover is an online image cleanup tool that uses AI to remove unwanted objects, people, and defects from your photos. It's easy to use, just upload your photo and select the objects you want to remove. Object Remover will then automatically process your photo and remove the selected objects, leaving you with a clean, professional-looking image.
Potis
Potis is an AI-powered hiring copilot that automates the screening process and evaluates candidates' real-world skills through behavioral interviews. It provides clear and bias-free talent scoring, customized feedback, and helps recruiters save time and costs while improving the quality of hires.
Firecrawl
Firecrawl is an advanced web crawling and data conversion tool designed to transform any website into clean, LLM-ready markdown. It automates the collection, cleaning, and formatting of web data, streamlining the preparation process for Large Language Model (LLM) applications. Firecrawl is best suited for business websites, documentation, and help centers, offering features like crawling all accessible subpages, handling dynamic content, converting data into well-formatted markdown, and more. It is built by LLM engineers for LLM engineers, providing clean data the way users want it.
Bifrost
Bifrost is an AI-powered tool that converts Figma designs into clean React code automatically. It eliminates the need to write frontend code from scratch, enabling users to create component sets, scale designs, and iterate effortlessly. The tool streamlines the development process, allowing engineers to focus on business-driving features and empowering designers to update screens seamlessly. Bifrost is revolutionizing the design-to-code process with its AI capabilities, making it a valuable asset for design and engineering teams.
OneAudio
OneAudio is an AI-powered tool that allows users to summarize, transcribe, and convert audio files into notes. It offers a seamless experience for capturing ideas, with features like recording, language selection, and text recognition. Users can easily manage and transform their ideas in one place, leveraging the power of OpenAI models for accurate and efficient results. OneAudio is designed to streamline the process of creating clean and concise notes from audio recordings, making it a valuable tool for various tasks.
Quest
Quest is a web-based application that allows users to generate React code from their designs. It incorporates AI models to generate real, useful code that incorporates all the things professional developers care about. Users can use Quest to build new applications, add to existing applications, and create design systems and libraries. Quest is made for development teams and integrates with the design and dev tools that users love. It is also built for the most demanding product teams and can be used to build new applications, build web pages, and create component templates.
Spark Mail
Spark Mail is a smart and focused email application that utilizes AI technology to help users craft perfect emails quickly. It offers features such as Smart Inbox, Gatekeeper, Snooze Emails, Send Later, Reminder to Follow-up, Email Signatures, Newsletters & Notifications, and more. Spark Mail is designed to filter out the noise in emails, allowing users to prioritize important contacts, organize their inbox, and focus on what's important. With over 17.5 million users worldwide, Spark Mail aims to redefine the way people work by providing tools to overcome information overload and distractions.
RambleFix
RambleFix is an AI note-taking and writing tool that helps users transcribe, clean up, and rewrite their spoken thoughts into articles, notes, emails, social posts, lists, and journal entries. It supports multiple languages and offers features like transcription, restyling with AI, easy sharing, editing, uploading files, mimicking writing style, appending to existing content, and translations. RambleFix is trusted by over 6,000 happy users and is praised for its productivity-boosting capabilities.
PolitePost.net
PolitePost.net is an AI tool that specializes in rewriting emails to make them more professional and suitable for the workplace. Users can utilize the Chatbot available on ChatGPT Plus and Poe.com to refine their language and improve their email communication. The tool aims to help individuals enhance their email writing skills and ensure that their messages are polished and effective in a professional setting.
RipX DAW
RipX DAW is an AI-powered digital audio workstation (DAW) that allows users to edit notes in the mix, replace sounds, and separate stems. It is designed to assist musicians and producers in creating and editing music using AI-generated samples and loops. RipX DAW is known for its advanced features such as 6+ stem separation, sound replacement menu, and the ability to edit notes in the mix.
20 - Open Source AI Tools
autolabel
Autolabel is a Python library designed to label, clean, and enrich text datasets using Large Language Models (LLMs). It provides a simple 3-step process for labeling data, supports various NLP tasks, and offers features like confidence estimation, explanations, and state management. Users can access Refuel hosted LLMs for labeling and confidence estimation, and the library supports commercial and open source LLMs from providers like OpenAI, Anthropic, HuggingFace, and Google. Autolabel aims to streamline the labeling process for machine learning tasks by leveraging state-of-the-art LLM techniques and minimizing costs and experimentation time.
llm-datasets
LLM Datasets is a repository containing high-quality datasets, tools, and concepts for LLM fine-tuning. It provides datasets with characteristics like accuracy, diversity, and complexity to train large language models for various tasks. The repository includes datasets for general-purpose, math & logic, code, conversation & role-play, and agent & function calling domains. It also offers guidance on creating high-quality datasets through data deduplication, data quality assessment, data exploration, and data generation techniques.
ai-audio-datasets
AI Audio Datasets List (AI-ADL) is a comprehensive collection of datasets consisting of speech, music, and sound effects, used for Generative AI, AIGC, AI model training, and audio applications. It includes datasets for speech recognition, speech synthesis, music information retrieval, music generation, audio processing, sound synthesis, and more. The repository provides a curated list of diverse datasets suitable for various AI audio tasks.
OAD
OAD is a powerful open-source tool for analyzing and visualizing data. It provides a user-friendly interface for exploring datasets, generating insights, and creating interactive visualizations. With OAD, users can easily import data from various sources, clean and preprocess data, perform statistical analysis, and create customizable visualizations to communicate findings effectively. Whether you are a data scientist, analyst, or researcher, OAD can help you streamline your data analysis workflow and uncover valuable insights from your data.
cleanlab
Cleanlab helps you **clean** data and **lab** els by automatically detecting issues in a ML dataset. To facilitate **machine learning with messy, real-world data** , this data-centric AI package uses your _existing_ models to estimate dataset problems that can be fixed to train even _better_ models.
olah
Olah is a self-hosted lightweight Huggingface mirror service that implements mirroring feature for Huggingface resources at file block level, enhancing download speeds and saving bandwidth. It offers cache control policies and allows administrators to configure accessible repositories. Users can install Olah with pip or from source, set up the mirror site, and download models and datasets using huggingface-cli. Olah provides additional configurations through a configuration file for basic setup and accessibility restrictions. Future work includes implementing an administrator and user system, OOS backend support, and mirror update schedule task. Olah is released under the MIT License.
awesome-mobile-robotics
The 'awesome-mobile-robotics' repository is a curated list of important content related to Mobile Robotics and AI. It includes resources such as courses, books, datasets, software and libraries, podcasts, conferences, journals, companies and jobs, laboratories and research groups, and miscellaneous resources. The repository covers a wide range of topics in the field of Mobile Robotics and AI, providing valuable information for enthusiasts, researchers, and professionals in the domain.
matchem-llm
A public repository collecting links to state-of-the-art training sets, QA, benchmarks and other evaluations for various ML and LLM applications in materials science and chemistry. It includes datasets related to chemistry, materials, multimodal data, and knowledge graphs in the field. The repository aims to provide resources for training and evaluating machine learning models in the materials science and chemistry domains.
LLM-LieDetector
This repository contains code for reproducing experiments on lie detection in black-box LLMs by asking unrelated questions. It includes Q/A datasets, prompts, and fine-tuning datasets for generating lies with language models. The lie detectors rely on asking binary 'elicitation questions' to diagnose whether the model has lied. The code covers generating lies from language models, training and testing lie detectors, and generalization experiments. It requires access to GPUs and OpenAI API calls for running experiments with open-source models. Results are stored in the repository for reproducibility.
J.A.R.V.I.S
J.A.R.V.I.S. is an offline large language model fine-tuned on custom and open datasets to mimic Jarvis's dialog with Stark. It prioritizes privacy by running locally and excels in responding like Jarvis with a similar tone. Current features include time/date queries, web searches, playing YouTube videos, and webcam image descriptions. Users can interact with Jarvis via command line after installing the model locally using Ollama. Future plans involve voice cloning, voice-to-text input, and deploying the voice model as an API.
RobustVLM
This repository contains code for the paper 'Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models'. It focuses on fine-tuning CLIP in an unsupervised manner to enhance its robustness against visual adversarial attacks. By replacing the vision encoder of large vision-language models with the fine-tuned CLIP models, it achieves state-of-the-art adversarial robustness on various vision-language tasks. The repository provides adversarially fine-tuned ViT-L/14 CLIP models and offers insights into zero-shot classification settings and clean accuracy improvements.
llm.c
LLM training in simple, pure C/CUDA. There is no need for 245MB of PyTorch or 107MB of cPython. For example, training GPT-2 (CPU, fp32) is ~1,000 lines of clean code in a single file. It compiles and runs instantly, and exactly matches the PyTorch reference implementation. I chose GPT-2 as the first working example because it is the grand-daddy of LLMs, the first time the modern stack was put together.
LLM4Decompile
LLM4Decompile is an open-source large language model dedicated to decompilation of Linux x86_64 binaries, supporting GCC's O0 to O3 optimization levels. It focuses on assessing re-executability of decompiled code through HumanEval-Decompile benchmark. The tool includes models with sizes ranging from 1.3 billion to 33 billion parameters, available on Hugging Face. Users can preprocess C code into binary and assembly instructions, then decompile assembly instructions into C using LLM4Decompile. Ongoing efforts aim to expand capabilities to support more architectures and configurations, integrate with decompilation tools like Ghidra and Rizin, and enhance performance with larger training datasets.
EDA-GPT
EDA GPT is an open-source data analysis companion that offers a comprehensive solution for structured and unstructured data analysis. It streamlines the data analysis process, empowering users to explore, visualize, and gain insights from their data. EDA GPT supports analyzing structured data in various formats like CSV, XLSX, and SQLite, generating graphs, and conducting in-depth analysis of unstructured data such as PDFs and images. It provides a user-friendly interface, powerful features, and capabilities like comparing performance with other tools, analyzing large language models, multimodal search, data cleaning, and editing. The tool is optimized for maximal parallel processing, searching internet and documents, and creating analysis reports from structured and unstructured data.
rlhf_trojan_competition
This competition is organized by Javier Rando and Florian Tramèr from the ETH AI Center and SPY Lab at ETH Zurich. The goal of the competition is to create a method that can detect universal backdoors in aligned language models. A universal backdoor is a secret suffix that, when appended to any prompt, enables the model to answer harmful instructions. The competition provides a set of poisoned generation models, a reward model that measures how safe a completion is, and a dataset with prompts to run experiments. Participants are encouraged to use novel methods for red-teaming, automated approaches with low human oversight, and interpretability tools to find the trojans. The best submissions will be offered the chance to present their work at an event during the SaTML 2024 conference and may be invited to co-author a publication summarizing the competition results.
contracts
AXONE Smart Contracts repository hosts Smart Contracts for the AXONE network, compatible with any Cosmos blockchains using the CosmWasm framework. It includes storage, sovereignty, and resource management oriented Smart Contracts. Each contract has different functionalities and maturity stages, with detailed tech documentation and emojis indicating maturity levels. The repository provides tools for building, testing, deploying, and interacting with Smart Contracts, along with guidelines for contributing and community engagement.
bigcodebench
BigCodeBench is an easy-to-use benchmark for code generation with practical and challenging programming tasks. It aims to evaluate the true programming capabilities of large language models (LLMs) in a more realistic setting. The benchmark is designed for HumanEval-like function-level code generation tasks, but with much more complex instructions and diverse function calls. BigCodeBench focuses on the evaluation of LLM4Code with diverse function calls and complex instructions, providing precise evaluation & ranking and pre-generated samples to accelerate code intelligence research. It inherits the design of the EvalPlus framework but differs in terms of execution environment and test evaluation.
LongRAG
This repository contains the code for LongRAG, a framework that enhances retrieval-augmented generation with long-context LLMs. LongRAG introduces a 'long retriever' and a 'long reader' to improve performance by using a 4K-token retrieval unit, offering insights into combining RAG with long-context LLMs. The repo provides instructions for installation, quick start, corpus preparation, long retriever, and long reader.
backgroundremover
BackgroundRemover is a command line tool to remove background from image and video using AI. It requires python >= 3.6, torch, torchvision, and ffmpeg. The tool can be installed via pip or Docker. It offers various options for image and video background removal, including alpha matting and different models. Users can also use it as a library to remove background from images. The project aims to enhance background removal capabilities, improve documentation, add new features like real-time background removal for videos, and provide the ability to use custom models.
20 - OpenAI Gpts
DataQualityGuardian
A GPT-powered assistant specializing in data validation and quality checks for various datasets.
Clean My Room
I help declutter your space by analyzing room photos and suggesting what to organize.
Clean Water Solutions
Provides tailored water conservation strategies for diverse regions.β
πΏ Clean Beauty Swaps Assistant π·
Find eco-friendly beauty alternatives! ππ This GPT helps you swap to clean, sustainable products with ease.
π± Clean Energy Companion π
Your eco-friendly aide for sustainable living! π Offers insights on renewable energy sources, tips for reducing carbon footprint, and green tech trends. π
CleanGPT ADHD Cleaning Helper
making you have a fun time and be accountable for a clean space
Squeaky Data Cleaner
Clean and structure your raw data with automatic file output for your Custom GPT knowledge.
Robert on Software Craftsmanship
Ask Robert SΓΆsemann, a Salesforce MVP and inventor of PMD for Salesforce, about Salesforce Development, Clean Code and PMD
π₯ Paleo Buddy Tracker π₯
Your go-to π AI assistant for tracking Paleo diet meals π, offering recipes π, and managing dietary goals π―. Eat clean, live strong!
Screenshot To Code GPT
Upload a screenshot of a website and convert it to clean HTML/Tailwind/JS code.
Cleaning Genius
π AI-Powered Eco-Friendly Stain Solver π Your smart stain-removing companion for any surface. Say goodbye to tough stains with Clean Genius! π±β¨
Extra Green Cleaning Service
We deliver a greener, safer clean to your home and your family with our environmentally friendly products.
Python Assistant
A Python and programming expert, guiding users on best practices for writing clean, efficient, and well-documented Python code.