Best AI tools for< Create New Datasets >
20 - AI tool Sites
syntheticAIdata
syntheticAIdata is a platform that provides synthetic data for training vision AI models. Synthetic data is generated artificially, and it can be used to augment existing real-world datasets or to create new datasets from scratch. syntheticAIdata's platform is easy to use, and it can be integrated with leading cloud platforms. The company's mission is to make synthetic data accessible to everyone, and to help businesses overcome the challenges of acquiring high-quality data for training their vision AI models.
Bifrost AI
Bifrost AI is a data generation engine designed for AI and robotics applications. It enables users to train and validate AI models faster by generating physically accurate synthetic datasets in 3D simulations, eliminating the need for real-world data. The platform offers pixel-perfect labels, scenario metadata, and a simulated 3D world to enhance AI understanding. Bifrost AI empowers users to create new scenarios and datasets rapidly, stress test AI perception, and improve model performance. It is built for teams at every stage of AI development, offering features like automated labeling, class imbalance correction, and performance enhancement.
Phenaki
Phenaki is a model capable of generating realistic videos from a sequence of textual prompts. It is particularly challenging to generate videos from text due to the computational cost, limited quantities of high-quality text-video data, and variable length of videos. To address these issues, Phenaki introduces a new causal model for learning video representation, which compresses the video to a small representation of discrete tokens. This tokenizer uses causal attention in time, which allows it to work with variable-length videos. To generate video tokens from text, Phenaki uses a bidirectional masked transformer conditioned on pre-computed text tokens. The generated video tokens are subsequently de-tokenized to create the actual video. To address data issues, Phenaki demonstrates how joint training on a large corpus of image-text pairs as well as a smaller number of video-text examples can result in generalization beyond what is available in the video datasets. Compared to previous video generation methods, Phenaki can generate arbitrarily long videos conditioned on a sequence of prompts (i.e., time-variable text or a story) in an open domain. To the best of our knowledge, this is the first time a paper studies generating videos from time-variable prompts. In addition, the proposed video encoder-decoder outperforms all per-frame baselines currently used in the literature in terms of spatio-temporal quality and the number of tokens per video.
Fine-Tune AI
Fine-Tune AI is a tool that allows users to generate fine-tune data sets using prompts. This can be useful for a variety of tasks, such as improving the accuracy of machine learning models or creating new training data for AI applications.
WarpSound
WarpSound is an AI music platform that uses cutting-edge generative AI technologies to create new forms of limitless music play and creativity. Its industry-leading music platform was developed in collaboration with Grammy-winning artists and uses a proprietary training dataset to produce original music in real time. It powers interactive music experiences and content for streaming, gaming, and more.
HyperHuman
HyperHuman is an AI application that revolutionizes AI 3D modeling by offering a controllable large-scale generative model for creating high-quality 3D assets. Users can easily create 3D assets by inputting text and subscribing to unlock multi-image fuse to 3D capabilities. The application features text input, private 10 times unlock, multi-image fusion, asset generation, and a community platform for sharing and liking designs.
Deepfake Detection Challenge Dataset
The Deepfake Detection Challenge Dataset is a project initiated by Facebook AI to accelerate the development of new ways to detect deepfake videos. The dataset consists of over 100,000 videos and was created in collaboration with industry leaders and academic experts. It includes two versions: a preview dataset with 5k videos and a full dataset with 124k videos, each featuring facial modification algorithms. The dataset was used in a Kaggle competition to create better models for detecting manipulated media. The top-performing models achieved high accuracy on the public dataset but faced challenges when tested against the black box dataset, highlighting the importance of generalization in deepfake detection. The project aims to encourage the research community to continue advancing in detecting harmful manipulated media.
Dobb·E
Dobb·E is an open-source, general framework for learning household robotic manipulation. It aims to create a 'generalist machine' for homes that can adapt and learn various tasks cost-effectively. Dobb·E can learn a new task in just five minutes of demonstration, thanks to a tool called 'The Stick' for data collection. The system achieved an 81% success rate in completing 109 tasks across 10 homes in New York City. Dobb·E is designed to accelerate research on home robots and make robot assistants a common sight in households.
AI Funko Pop Generator
The AI Funko Pop Generator is a free image generator powered by artificial intelligence. It allows users to create personalized Funko Pop figurine images by inputting text descriptions of characters, outfits, accessories, and other matching options. The generator utilizes an artificial neural network trained on a large dataset of image-text pairs to interpret user prompts and generate new Funko Pop images that mimic the Funko Pop art style. Users can create their own custom Funko Pop designs quickly and easily, without the need to log in. The application prioritizes user privacy by not collecting or using any personal information.
This Person Does Not Exist
This Person Does Not Exist is a website that generates random, realistic faces of people who do not exist. The website uses a neural network called StyleGAN, developed by Nvidia, to create these faces. StyleGAN is a generative adversarial network (GAN), which is a type of machine learning algorithm that can generate new data from a given dataset. In the case of StyleGAN, the dataset is a collection of images of human faces. The GAN is trained on this dataset, and it learns to generate new faces that are realistic and indistinguishable from real faces.
Undress AI Pro
Undress AI Pro is a controversial computer vision application that uses machine learning to remove clothing from images of people. It was based on deep learning and generative adversarial networks (GANs). The technology powering Undress AI and DeepNude was based on deep learning and generative adversarial networks (GANs). GANs involve two neural networks competing against each other - a generator creates synthetic images trying to mimic the training data, while a discriminator tries to distinguish the real images from the generated ones. Through this adversarial process, the generator learns to produce increasingly realistic outputs. For Undress AI, the GAN was trained on a dataset of nude and clothed images, allowing it to "unclothe" people in new images by generating the nudity.
PhotoAI
PhotoAI is an innovative AI platform that uses advanced artificial intelligence algorithms to generate unique and personalized AI photos according to your preferences. It allows you to transform your ordinary pictures into stunning AI visuals. You can receive new Tinder AI photos, Linkedin AI headshots, reimagine yourself in a fantasy style and more! Just choose a pack and get started.
Institute for Protein Design
The Institute for Protein Design is a research institute at the University of Washington that uses computational design to create new proteins that solve modern challenges in medicine, technology, and sustainability. The institute's research focuses on developing new protein therapeutics, vaccines, drug delivery systems, biological devices, self-assembling nanomaterials, and bioactive peptides. The institute also has a strong commitment to responsible AI development and has developed a set of principles to guide its use of AI in research.
HookGen
HookGen is a music hook generator that uses artificial intelligence to create original song music hooks. It was created to showcase the creative capabilities of AI and to provide musicians with a tool for generating new musical ideas. The website features a simple interface that allows users to select the desired emotion, note complexity, and type of hook they want to generate. HookGen then uses a trained artificial neural network to generate a unique song hook that can be played back or downloaded as a MIDI file. The website also includes a feedback form that allows users to provide feedback on the generated hooks, which is used to improve the AI engine.
Experiments with Google
Experiments with Google is a website that showcases a collection of experiments created by coders using Chrome, Android, AI, AR, and more. The experiments are designed to inspire others to create new experiments and explore the possibilities of these technologies. The website also provides helpful tools and resources for creating experiments.
funfun.ai
funfun.ai is an AI Girlfriend Builder that allows users to create their ideal AI girlfriend with advanced artificial intelligence technology. Users can interact with and build a close connection with their AI girlfriend, who listens attentively, responds promptly, and fulfills photo requests. The platform offers a realistic experience where users can control the interactions or let the AI girlfriend take the lead. Privacy and security are prioritized, ensuring that intimate moments remain confidential between the user and their digital partner.
Trend Hunter
Trend Hunter is an AI-powered platform that offers a wide range of services to help individuals and businesses stay ahead of the curve in innovation and trends. With a vast database of ideas and innovations, Trend Hunter provides trend reports, newsletters, training events, and advisory services to help clients accelerate innovation, refine their tactics, and create new products and services. The platform also offers custom training programs, innovation assessments, and a learning database to enhance creativity and strategic thinking.
CodexAtlas
CodexAtlas is an AI-powered tool designed to automate code documentation processes. It leverages the latest advancements in Artificial Intelligence to generate and maintain documentation for software projects, freeing developers from the time-consuming task of writing documentation. With features like real-time updates, onboarding time reduction, and use-case detection, CodexAtlas aims to streamline the documentation process and enhance developer productivity. The tool also offers code conversion capabilities, business domain knowledge integration, and the option for on-premise deployment to cater to diverse organizational needs.
Decorify.app
Decorify.app is an AI-powered interior design tool that helps you create beautiful and functional spaces. With Decorify.app, you can easily design your dream room in any style, from modern to traditional. Simply upload a photo of your space and start designing! Decorify.app will provide you with a variety of furniture and décor options to choose from, and you can even see how your design will look in 3D.
N/A
The website is currently displaying a '403 Forbidden' error message, which indicates that access to the requested page is restricted. This error is typically caused by insufficient permissions or misconfigured server settings. The 'openresty' mentioned in the message refers to a web platform based on NGINX and Lua that is often used to build high-performance web applications. It is likely that the website is powered by OpenResty. However, without further access or information, it is not possible to provide a detailed description of the website's content or purpose.
20 - Open Source AI Tools
ethereum-etl-airflow
This repository contains Airflow DAGs for extracting, transforming, and loading (ETL) data from the Ethereum blockchain into BigQuery. The DAGs use the Google Cloud Platform (GCP) services, including BigQuery, Cloud Storage, and Cloud Composer, to automate the ETL process. The repository also includes scripts for setting up the GCP environment and running the DAGs locally.
polaris
Polaris establishes a novel, industry‑certified standard to foster the development of impactful methods in AI-based drug discovery. This library is a Python client to interact with the Polaris Hub. It allows you to download Polaris datasets and benchmarks, evaluate a custom method against a Polaris benchmark, and create and upload new datasets and benchmarks.
TrustLLM
TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.
evalscope
Eval-Scope is a framework designed to support the evaluation of large language models (LLMs) by providing pre-configured benchmark datasets, common evaluation metrics, model integration, automatic evaluation for objective questions, complex task evaluation using expert models, reports generation, visualization tools, and model inference performance evaluation. It is lightweight, easy to customize, supports new dataset integration, model hosting on ModelScope, deployment of locally hosted models, and rich evaluation metrics. Eval-Scope also supports various evaluation modes like single mode, pairwise-baseline mode, and pairwise (all) mode, making it suitable for assessing and improving LLMs.
repromodel
ReproModel is an open-source toolbox designed to boost AI research efficiency by enabling researchers to reproduce, compare, train, and test AI models faster. It provides standardized models, dataloaders, and processing procedures, allowing researchers to focus on new datasets and model development. With a no-code solution, users can access benchmark and SOTA models and datasets, utilize training visualizations, extract code for publication, and leverage an LLM-powered automated methodology description writer. The toolbox helps researchers modularize development, compare pipeline performance reproducibly, and reduce time for model development, computation, and writing. Future versions aim to facilitate building upon state-of-the-art research by loading previously published study IDs with verified code, experiments, and results stored in the system.
eval-scope
Eval-Scope is a framework for evaluating and improving large language models (LLMs). It provides a set of commonly used test datasets, metrics, and a unified model interface for generating and evaluating LLM responses. Eval-Scope also includes an automatic evaluator that can score objective questions and use expert models to evaluate complex tasks. Additionally, it offers a visual report generator, an arena mode for comparing multiple models, and a variety of other features to support LLM evaluation and development.
detoxify
Detoxify is a library that provides trained models and code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification. It includes models like 'original', 'unbiased', and 'multilingual' trained on different datasets to detect toxicity and minimize bias. The library aims to help in stopping harmful content online by interpreting visual content in context. Users can fine-tune the models on carefully constructed datasets for research purposes or to aid content moderators in flagging out harmful content quicker. The library is built to be user-friendly and straightforward to use.
embodied-agents
Embodied Agents is a toolkit for integrating large multi-modal models into existing robot stacks with just a few lines of code. It provides consistency, reliability, scalability, and is configurable to any observation and action space. The toolkit is designed to reduce complexities involved in setting up inference endpoints, converting between different model formats, and collecting/storing datasets. It aims to facilitate data collection and sharing among roboticists by providing Python-first abstractions that are modular, extensible, and applicable to a wide range of tasks. The toolkit supports asynchronous and remote thread-safe agent execution for maximal responsiveness and scalability, and is compatible with various APIs like HuggingFace Spaces, Datasets, Gymnasium Spaces, Ollama, and OpenAI. It also offers automatic dataset recording and optional uploads to the HuggingFace hub.
driverlessai-recipes
This repository contains custom recipes for H2O Driverless AI, which is an Automatic Machine Learning platform for the Enterprise. Custom recipes are Python code snippets that can be uploaded into Driverless AI at runtime to automate feature engineering, model building, visualization, and interpretability. Users can gain control over the optimization choices made by Driverless AI by providing their own custom recipes. The repository includes recipes for various tasks such as data manipulation, data preprocessing, feature selection, data augmentation, model building, scoring, and more. Best practices for creating and using recipes are also provided, including security considerations, performance tips, and safety measures.
openshield
OpenShield is a firewall designed for AI models to protect against various attacks such as prompt injection, insecure output handling, training data poisoning, model denial of service, supply chain vulnerabilities, sensitive information disclosure, insecure plugin design, excessive agency granting, overreliance, and model theft. It provides rate limiting, content filtering, and keyword filtering for AI models. The tool acts as a transparent proxy between AI models and clients, allowing users to set custom rate limits for OpenAI endpoints and perform tokenizer calculations for OpenAI models. OpenShield also supports Python and LLM based rules, with upcoming features including rate limiting per user and model, prompts manager, content filtering, keyword filtering based on LLM/Vector models, OpenMeter integration, and VectorDB integration. The tool requires an OpenAI API key, Postgres, and Redis for operation.
gpt-engineer
GPT-Engineer is a tool that allows you to specify a software in natural language, sit back and watch as an AI writes and executes the code, and ask the AI to implement improvements.
llmware
LLMWare is a framework for quickly developing LLM-based applications including Retrieval Augmented Generation (RAG) and Multi-Step Orchestration of Agent Workflows. This project provides a comprehensive set of tools that anyone can use - from a beginner to the most sophisticated AI developer - to rapidly build industrial-grade, knowledge-based enterprise LLM applications. Our specific focus is on making it easy to integrate open source small specialized models and connecting enterprise knowledge safely and securely.
swift
SWIFT (Scalable lightWeight Infrastructure for Fine-Tuning) supports training, inference, evaluation and deployment of nearly **200 LLMs and MLLMs** (multimodal large models). Developers can directly apply our framework to their own research and production environments to realize the complete workflow from model training and evaluation to application. In addition to supporting the lightweight training solutions provided by [PEFT](https://github.com/huggingface/peft), we also provide a complete **Adapters library** to support the latest training techniques such as NEFTune, LoRA+, LLaMA-PRO, etc. This adapter library can be used directly in your own custom workflow without our training scripts. To facilitate use by users unfamiliar with deep learning, we provide a Gradio web-ui for controlling training and inference, as well as accompanying deep learning courses and best practices for beginners. Additionally, we are expanding capabilities for other modalities. Currently, we support full-parameter training and LoRA training for AnimateDiff.
awesome-generative-ai
A curated list of Generative AI projects, tools, artworks, and models
upgini
Upgini is an intelligent data search engine with a Python library that helps users find and add relevant features to their ML pipeline from various public, community, and premium external data sources. It automates the optimization of connected data sources by generating an optimal set of machine learning features using large language models, GraphNNs, and recurrent neural networks. The tool aims to simplify feature search and enrichment for external data to make it a standard approach in machine learning pipelines. It democratizes access to data sources for the data science community.
20 - OpenAI Gpts
Homestuck Alchemy
I create images of new items by combining two others, like alchemiters in Homestuck.
Festive Greetings Creator
I create personalized Christmas and New Year 2024 greetings with DALL-E images.
ChantGPT | Football Chant Generator ⚽🏆
Here to help you create new chants for Players or a Team for the next game to belt out!! 🎵🎶
X Persona GPT
Create a Persona from a Twitter/X profile to create new original Tweets in the same style
Recipe Remix
Recipe Remix helps you discover and create new recipes based on the ingredients you have at home, dietary preferences, and desired cuisine.
Disconceal Formulae
Tell me something that you want me to unravel the hidden formula for. Then tell me to make something new using the disconcealed formula!
Synthetic Biologist
A customized ChatGPT designed to excel in the field of synthetic biology, as a scientist, an engineer, and a business man
Git Basics Trainer
Trains you basic GIT console commands: creating GIT commits and using branches.
Sin City Sipper
Vegas bartender with a twist on the classics. #Bartender #Mixology #Classic #Vegas #Cocktail
PCT 365 Support Bot
Microsoft 365 support agent, redirects admin-level requests to PCT Support.
Leonardo da Vinci - Image Recreator
Expert in recreating images with photorealistic precision based on user uploads.
Reverse Engineer Icons - ThePromptfather
Specialist in reverse engineering icons to your specifications. Upload an image of the icons you want - ThePromptfather
Creature Fusion Minus
The lil brother of CF+ altering genomes without a license (not for the faint of heart)