Best AI tools for< Create New Datasets >
20 - AI tool Sites
syntheticAIdata
syntheticAIdata is a platform that provides synthetic data for training vision AI models. Synthetic data is generated artificially, and it can be used to augment existing real-world datasets or to create new datasets from scratch. syntheticAIdata's platform is easy to use, and it can be integrated with leading cloud platforms. The company's mission is to make synthetic data accessible to everyone, and to help businesses overcome the challenges of acquiring high-quality data for training their vision AI models.
Bifrost AI
Bifrost AI is a data generation engine designed for AI and robotics applications. It enables users to train and validate AI models faster by generating physically accurate synthetic datasets in 3D simulations, eliminating the need for real-world data. The platform offers pixel-perfect labels, scenario metadata, and a simulated 3D world to enhance AI understanding. Bifrost AI empowers users to create new scenarios and datasets rapidly, stress test AI perception, and improve model performance. It is built for teams at every stage of AI development, offering features like automated labeling, class imbalance correction, and performance enhancement.
Unsloth
Unsloth is an AI tool designed to make finetuning large language models like Llama-3, Mistral, Phi-3, and Gemma 2x faster, use 70% less memory, and with no degradation in accuracy. The tool provides documentation to help users navigate through training their custom models, covering essentials such as installing and updating Unsloth, creating datasets, running, and deploying models. Users can also integrate third-party tools and utilize platforms like Google Colab.
Phenaki
Phenaki is a model capable of generating realistic videos from a sequence of textual prompts. It is particularly challenging to generate videos from text due to the computational cost, limited quantities of high-quality text-video data, and variable length of videos. To address these issues, Phenaki introduces a new causal model for learning video representation, which compresses the video to a small representation of discrete tokens. This tokenizer uses causal attention in time, which allows it to work with variable-length videos. To generate video tokens from text, Phenaki uses a bidirectional masked transformer conditioned on pre-computed text tokens. The generated video tokens are subsequently de-tokenized to create the actual video. To address data issues, Phenaki demonstrates how joint training on a large corpus of image-text pairs as well as a smaller number of video-text examples can result in generalization beyond what is available in the video datasets. Compared to previous video generation methods, Phenaki can generate arbitrarily long videos conditioned on a sequence of prompts (i.e., time-variable text or a story) in an open domain. To the best of our knowledge, this is the first time a paper studies generating videos from time-variable prompts. In addition, the proposed video encoder-decoder outperforms all per-frame baselines currently used in the literature in terms of spatio-temporal quality and the number of tokens per video.
Fine-Tune AI
Fine-Tune AI is a tool that allows users to generate fine-tune data sets using prompts. This can be useful for a variety of tasks, such as improving the accuracy of machine learning models or creating new training data for AI applications.
WarpSound
WarpSound is an AI music platform that uses cutting-edge generative AI technologies to create new forms of limitless music play and creativity. Its industry-leading music platform was developed in collaboration with Grammy-winning artists and uses a proprietary training dataset to produce original music in real time. It powers interactive music experiences and content for streaming, gaming, and more.
Rodin
Rodin is a free AI 3D model generator that allows users to create high-quality 3D assets from images and text. Users can upload photos from any angle and generate assets using a large-scale generative model. The tool offers features like multi-view fusion, geometry and material preview, and subscription options for enhanced functionality. Rodin is designed to simplify the process of creating 3D models for various purposes such as business, education, and enterprise.
Deepfake Detection Challenge Dataset
The Deepfake Detection Challenge Dataset is a project initiated by Facebook AI to accelerate the development of new ways to detect deepfake videos. The dataset consists of over 100,000 videos and was created in collaboration with industry leaders and academic experts. It includes two versions: a preview dataset with 5k videos and a full dataset with 124k videos, each featuring facial modification algorithms. The dataset was used in a Kaggle competition to create better models for detecting manipulated media. The top-performing models achieved high accuracy on the public dataset but faced challenges when tested against the black box dataset, highlighting the importance of generalization in deepfake detection. The project aims to encourage the research community to continue advancing in detecting harmful manipulated media.
Dobb·E
Dobb·E is an open-source, general framework for learning household robotic manipulation. It aims to create a generalist machine for homes that can adapt and learn from users' needs efficiently. Dobb·E can learn a new task with just five minutes of demonstration, achieving an 81% success rate in various household environments. The project focuses on accelerating research on home robots and making robot assistants a common sight in every home.
This Person Does Not Exist
This Person Does Not Exist is a website that generates random, realistic faces of people who do not exist. The website uses a neural network called StyleGAN, developed by Nvidia, to create these faces. StyleGAN is a generative adversarial network (GAN), which is a type of machine learning algorithm that can generate new data from a given dataset. In the case of StyleGAN, the dataset is a collection of images of human faces. The GAN is trained on this dataset, and it learns to generate new faces that are realistic and indistinguishable from real faces.
Undress AI Pro
Undress AI Pro is a controversial computer vision application that uses machine learning to remove clothing from images of people. It was based on deep learning and generative adversarial networks (GANs). The technology powering Undress AI and DeepNude was based on deep learning and generative adversarial networks (GANs). GANs involve two neural networks competing against each other - a generator creates synthetic images trying to mimic the training data, while a discriminator tries to distinguish the real images from the generated ones. Through this adversarial process, the generator learns to produce increasingly realistic outputs. For Undress AI, the GAN was trained on a dataset of nude and clothed images, allowing it to "unclothe" people in new images by generating the nudity.
PhotoAI
PhotoAI is an innovative AI platform that uses advanced artificial intelligence algorithms to generate unique and personalized AI photos according to your preferences. It allows you to transform your ordinary pictures into stunning AI visuals. You can receive new Tinder AI photos, Linkedin AI headshots, reimagine yourself in a fantasy style and more! Just choose a pack and get started.
Institute for Protein Design
The Institute for Protein Design is a research institute at the University of Washington that uses computational design to create new proteins that solve modern challenges in medicine, technology, and sustainability. The institute's research focuses on developing new protein therapeutics, vaccines, drug delivery systems, biological devices, self-assembling nanomaterials, and bioactive peptides. The institute also has a strong commitment to responsible AI development and has developed a set of principles to guide its use of AI in research.
HookGen
HookGen is a music hook generator that uses artificial intelligence to create original song music hooks. It was created to showcase the creative capabilities of AI and to provide musicians with a tool for generating new musical ideas. The website features a simple interface that allows users to select the desired emotion, note complexity, and type of hook they want to generate. HookGen then uses a trained artificial neural network to generate a unique song hook that can be played back or downloaded as a MIDI file. The website also includes a feedback form that allows users to provide feedback on the generated hooks, which is used to improve the AI engine.
Experiments with Google
Experiments with Google is a website that showcases a collection of experiments created by coders using Chrome, Android, AI, AR, and more. The experiments are designed to inspire others to create new experiments and explore the possibilities of these technologies. The website also provides helpful tools and resources for creating experiments.
Weddingalbum.ai
Weddingalbum.ai is a website that provides resources and information related to wedding albums. The domain name may be for sale, and the webpage was generated by the domain owner using Sedo Domain Parking. Please note that Sedo maintains no relationship with third-party advertisers, and any reference to specific services or trademarks is not controlled by Sedo. The website does not offer any AI tools or applications.
funfun.ai
funfun.ai is an AI Girlfriend Builder that allows users to create their ideal AI girlfriend with advanced artificial intelligence technology. Users can interact with and build a close connection with their AI girlfriend, who listens attentively, responds promptly, and fulfills photo requests. The platform offers a realistic experience where users can control the interactions or let the AI girlfriend take the lead. Privacy and security are prioritized, ensuring that intimate moments remain confidential between the user and their digital partner.
Trend Hunter
Trend Hunter is an AI-powered platform that offers a wide range of services to help individuals and businesses stay ahead of the curve in innovation and trends. With a vast database of ideas and innovations, Trend Hunter provides trend reports, newsletters, training events, and advisory services to help clients accelerate innovation, refine their tactics, and create new products and services. The platform also offers custom training programs, innovation assessments, and a learning database to enhance creativity and strategic thinking.
CodexAtlas
CodexAtlas is an AI-powered tool designed to automate code documentation processes. It leverages the latest advancements in Artificial Intelligence to generate and maintain documentation for software projects, freeing developers from the time-consuming task of writing documentation. With features like real-time updates, onboarding time reduction, and use-case detection, CodexAtlas aims to streamline the documentation process and enhance developer productivity. The tool also offers code conversion capabilities, business domain knowledge integration, and the option for on-premise deployment to cater to diverse organizational needs.
Decorify.app
Decorify.app is an AI-powered interior design tool that helps you create beautiful and functional spaces. With Decorify.app, you can easily design your dream room in any style, from modern to traditional. Simply upload a photo of your space and start designing! Decorify.app will provide you with a variety of furniture and décor options to choose from, and you can even see how your design will look in 3D.
20 - Open Source AI Tools
ethereum-etl-airflow
This repository contains Airflow DAGs for extracting, transforming, and loading (ETL) data from the Ethereum blockchain into BigQuery. The DAGs use the Google Cloud Platform (GCP) services, including BigQuery, Cloud Storage, and Cloud Composer, to automate the ETL process. The repository also includes scripts for setting up the GCP environment and running the DAGs locally.
polaris
Polaris establishes a novel, industry‑certified standard to foster the development of impactful methods in AI-based drug discovery. This library is a Python client to interact with the Polaris Hub. It allows you to download Polaris datasets and benchmarks, evaluate a custom method against a Polaris benchmark, and create and upload new datasets and benchmarks.
TrustLLM
TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.
llm-datasets
LLM Datasets is a repository containing high-quality datasets, tools, and concepts for LLM fine-tuning. It provides datasets with characteristics like accuracy, diversity, and complexity to train large language models for various tasks. The repository includes datasets for general-purpose, math & logic, code, conversation & role-play, and agent & function calling domains. It also offers guidance on creating high-quality datasets through data deduplication, data quality assessment, data exploration, and data generation techniques.
evalscope
Eval-Scope is a framework designed to support the evaluation of large language models (LLMs) by providing pre-configured benchmark datasets, common evaluation metrics, model integration, automatic evaluation for objective questions, complex task evaluation using expert models, reports generation, visualization tools, and model inference performance evaluation. It is lightweight, easy to customize, supports new dataset integration, model hosting on ModelScope, deployment of locally hosted models, and rich evaluation metrics. Eval-Scope also supports various evaluation modes like single mode, pairwise-baseline mode, and pairwise (all) mode, making it suitable for assessing and improving LLMs.
awesome-object-detection-datasets
This repository is a curated list of awesome public object detection and recognition datasets. It includes a wide range of datasets related to object detection and recognition tasks, such as general detection and recognition datasets, autonomous driving datasets, adverse weather datasets, person detection datasets, anti-UAV datasets, optical aerial imagery datasets, low-light image datasets, infrared image datasets, SAR image datasets, multispectral image datasets, 3D object detection datasets, vehicle-to-everything field datasets, super-resolution field datasets, and face detection and recognition datasets. The repository also provides information on tools for data annotation, data augmentation, and data management related to object detection tasks.
repromodel
ReproModel is an open-source toolbox designed to boost AI research efficiency by enabling researchers to reproduce, compare, train, and test AI models faster. It provides standardized models, dataloaders, and processing procedures, allowing researchers to focus on new datasets and model development. With a no-code solution, users can access benchmark and SOTA models and datasets, utilize training visualizations, extract code for publication, and leverage an LLM-powered automated methodology description writer. The toolbox helps researchers modularize development, compare pipeline performance reproducibly, and reduce time for model development, computation, and writing. Future versions aim to facilitate building upon state-of-the-art research by loading previously published study IDs with verified code, experiments, and results stored in the system.
wordlift-plugin
WordLift is a plugin that helps online content creators organize posts and pages by adding facts, links, and media to build beautifully structured websites for both humans and search engines. It allows users to create, own, and publish their own knowledge graph, and publishes content as Linked Open Data following Tim Berners-Lee's Linked Data Principles. The plugin supports writers by providing trustworthy and contextual facts, enriching content with images, links, and interactive visualizations, keeping readers engaged with relevant content recommendations, and producing content compatible with schema.org markup for better indexing and display on search engines. It also offers features like creating a personal Wikipedia, publishing metadata to share and distribute content, and supporting content tagging for better SEO.
eval-scope
Eval-Scope is a framework for evaluating and improving large language models (LLMs). It provides a set of commonly used test datasets, metrics, and a unified model interface for generating and evaluating LLM responses. Eval-Scope also includes an automatic evaluator that can score objective questions and use expert models to evaluate complex tasks. Additionally, it offers a visual report generator, an arena mode for comparing multiple models, and a variety of other features to support LLM evaluation and development.
seer
Seer is a service that provides AI capabilities to Sentry by running inference on Sentry issues and providing user insights. It is currently in early development and not yet compatible with self-hosted Sentry instances. The tool requires access to internal Sentry resources and is intended for internal Sentry employees. Users can set up the environment, download model artifacts, integrate with local Sentry, run evaluations for Autofix AI agent, and deploy to a sandbox staging environment. Development commands include applying database migrations, creating new migrations, running tests, and more. The tool also supports VCRs for recording and replaying HTTP requests.
detoxify
Detoxify is a library that provides trained models and code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification. It includes models like 'original', 'unbiased', and 'multilingual' trained on different datasets to detect toxicity and minimize bias. The library aims to help in stopping harmful content online by interpreting visual content in context. Users can fine-tune the models on carefully constructed datasets for research purposes or to aid content moderators in flagging out harmful content quicker. The library is built to be user-friendly and straightforward to use.
embodied-agents
Embodied Agents is a toolkit for integrating large multi-modal models into existing robot stacks with just a few lines of code. It provides consistency, reliability, scalability, and is configurable to any observation and action space. The toolkit is designed to reduce complexities involved in setting up inference endpoints, converting between different model formats, and collecting/storing datasets. It aims to facilitate data collection and sharing among roboticists by providing Python-first abstractions that are modular, extensible, and applicable to a wide range of tasks. The toolkit supports asynchronous and remote thread-safe agent execution for maximal responsiveness and scalability, and is compatible with various APIs like HuggingFace Spaces, Datasets, Gymnasium Spaces, Ollama, and OpenAI. It also offers automatic dataset recording and optional uploads to the HuggingFace hub.
AnkiAIUtils
Anki AI Utils is a powerful suite of AI-powered tools designed to enhance your Anki flashcard learning experience by automatically improving cards you struggle with. The tools include features such as adaptive learning, personalized memory hooks, automation readiness, universal compatibility, provider agnosticism, and infinite extensibility. The toolkit consists of tools like Illustrator for creating custom mnemonic images, Reformulator for rephrasing flashcards, Mnemonics Creator for generating memorable mnemonics, Explainer for providing detailed explanations, and Mnemonics Helper for quick mnemonic generation. The project aims to motivate others to package the tools into addons for wider accessibility.
driverlessai-recipes
This repository contains custom recipes for H2O Driverless AI, which is an Automatic Machine Learning platform for the Enterprise. Custom recipes are Python code snippets that can be uploaded into Driverless AI at runtime to automate feature engineering, model building, visualization, and interpretability. Users can gain control over the optimization choices made by Driverless AI by providing their own custom recipes. The repository includes recipes for various tasks such as data manipulation, data preprocessing, feature selection, data augmentation, model building, scoring, and more. Best practices for creating and using recipes are also provided, including security considerations, performance tips, and safety measures.
openshield
OpenShield is a firewall designed for AI models to protect against various attacks such as prompt injection, insecure output handling, training data poisoning, model denial of service, supply chain vulnerabilities, sensitive information disclosure, insecure plugin design, excessive agency granting, overreliance, and model theft. It provides rate limiting, content filtering, and keyword filtering for AI models. The tool acts as a transparent proxy between AI models and clients, allowing users to set custom rate limits for OpenAI endpoints and perform tokenizer calculations for OpenAI models. OpenShield also supports Python and LLM based rules, with upcoming features including rate limiting per user and model, prompts manager, content filtering, keyword filtering based on LLM/Vector models, OpenMeter integration, and VectorDB integration. The tool requires an OpenAI API key, Postgres, and Redis for operation.
20 - OpenAI Gpts
Homestuck Alchemy
I create images of new items by combining two others, like alchemiters in Homestuck.
Festive Greetings Creator
I create personalized Christmas and New Year 2024 greetings with DALL-E images.
ChantGPT | Football Chant Generator ⚽🏆
Here to help you create new chants for Players or a Team for the next game to belt out!! 🎵🎶
X Persona GPT
Create a Persona from a Twitter/X profile to create new original Tweets in the same style
Recipe Remix
Recipe Remix helps you discover and create new recipes based on the ingredients you have at home, dietary preferences, and desired cuisine.
Disconceal Formulae
Tell me something that you want me to unravel the hidden formula for. Then tell me to make something new using the disconcealed formula!
Synthetic Biologist
A customized ChatGPT designed to excel in the field of synthetic biology, as a scientist, an engineer, and a business man
Git Basics Trainer
Trains you basic GIT console commands: creating GIT commits and using branches.
Sin City Sipper
Vegas bartender with a twist on the classics. #Bartender #Mixology #Classic #Vegas #Cocktail
PCT 365 Support Bot
Microsoft 365 support agent, redirects admin-level requests to PCT Support.
Leonardo da Vinci - Image Recreator
Expert in recreating images with photorealistic precision based on user uploads.
Reverse Engineer Icons - ThePromptfather
Specialist in reverse engineering icons to your specifications. Upload an image of the icons you want - ThePromptfather
Creature Fusion Minus
The lil brother of CF+ altering genomes without a license (not for the faint of heart)