Best AI tools for< Pre-process Data >
20 - AI tool Sites
FinetuneFast
FinetuneFast is an AI tool designed to help developers, indie makers, and businesses to efficiently finetune machine learning models, process data, and deploy AI solutions at lightning speed. With pre-configured training scripts, efficient data loading pipelines, and one-click model deployment, FinetuneFast streamlines the process of building and deploying AI models, saving users valuable time and effort. The tool is user-friendly, accessible for ML beginners, and offers lifetime updates for continuous improvement.
GPUX
GPUX is a cloud platform that provides access to GPUs for running AI workloads. It offers a variety of features to make it easy to deploy and run AI models, including a user-friendly interface, pre-built templates, and support for a variety of programming languages. GPUX is also committed to providing a sustainable and ethical platform, and it has partnered with organizations such as the Climate Leadership Council to reduce its carbon footprint.
FormX.ai
FormX.ai is an AI-powered data extraction and conversion tool that automates the process of extracting data from physical documents and converting it into digital formats. It supports a wide range of document types, including invoices, receipts, purchase orders, bank statements, contracts, HR forms, shipping orders, loyalty member applications, annual reports, business certificates, personnel licenses, and more. FormX.ai's pre-configured data extraction models and effortless API integration make it easy for businesses to integrate data extraction into their existing systems and workflows. With FormX.ai, businesses can save time and money on manual data entry and improve the accuracy and efficiency of their data processing.
Liner.ai
Liner is a free and easy-to-use tool that allows users to train machine learning models without writing any code. It provides a user-friendly interface that guides users through the process of importing data, selecting a model, and training the model. Liner also offers a variety of pre-trained models that can be used for common tasks such as image classification, text classification, and object detection. With Liner, users can quickly and easily create and deploy machine learning applications without the need for specialized knowledge or expertise.
Maekersuite
Maekersuite is an AI-powered platform designed to assist users in researching and scripting videos. It offers a wide range of tools and features to streamline the video creation process, from generating video ideas to optimizing scripts using data and AI. The platform aims to help users create engaging and data-driven video content for various purposes such as marketing, social media, education, and business.
DigitalGenius
DigitalGenius is an AI-powered customer service platform designed specifically for e-commerce businesses. It offers a range of features to help businesses automate customer interactions, resolve issues quickly and efficiently, and improve the overall customer experience. DigitalGenius seamlessly integrates with existing e-commerce platforms and shipping carriers, allowing businesses to manage customer inquiries and orders from a single platform. The platform's AI capabilities enable it to understand customer intent, generate personalized responses, and provide proactive support. DigitalGenius is trusted by leading brands such as On, Air up, and Waterdrop, and has helped them improve customer satisfaction, reduce costs, and scale their operations.
RecruitGenius.ai
RecruitGenius.ai is an AI-powered automated recruiting tool designed to streamline and optimize the recruitment process for HR professionals and hiring managers. The platform offers features such as AI auto screening, one-way video interviews, smart direct interview scheduler, talent pool management system, candidate relationship management, and analytic reports. RecruitGenius.ai aims to help users pre-qualify candidates efficiently, schedule interviews seamlessly, and manage the recruitment workflow effectively, ultimately saving time and effort in the hiring process. The tool is designed to enhance recruitment efficiency and improve the quality of hires by leveraging AI technology.
Keytalk AI
Keytalk AI is a company that specializes in prompt engineering, which is the process of creating prompts that can be used to generate text, images, and other types of content using artificial intelligence (AI) models. Keytalk AI's mission is to make AI more accessible and user-friendly by providing tools and resources that make it easy for people to create and use AI-generated content. The company's flagship product is Keytalk Prompts, a library of pre-written prompts that can be used to generate content on a variety of topics. Keytalk AI also offers a range of other services, including consulting, training, and support.
Docsumo
Docsumo is an advanced Document AI platform designed for scalability and efficiency. It offers a wide range of capabilities such as pre-processing documents, extracting data, reviewing and analyzing documents. The platform provides features like document classification, touchless processing, ready-to-use AI models, auto-split functionality, and smart table extraction. Docsumo is a leader in intelligent document processing and is trusted by various industries for its accurate data extraction capabilities. The platform enables enterprises to digitize their document processing workflows, reduce manual efforts, and maximize data accuracy through its AI-powered solutions.
Beam AI
Beam AI is a leading platform for Agentic Automation and AI Agents, offering solutions for automating manual workflows with AI agents to boost productivity. The platform is used by Fortune 500 companies and scale-ups, providing a seamless experience from start to finish. With features like pre-trained agents, integrations, and easy-to-navigate UI, Beam AI empowers users to create and deploy AI tools tailored to their specific needs. The platform is designed to scale any business, offering industry-specific solutions and world-class support for building AI-native organizations.
PYQ
PYQ is an AI-powered platform that helps businesses automate document-related tasks, such as data extraction, form filling, and system integration. It uses natural language processing (NLP) and machine learning (ML) to understand the content of documents and perform tasks accordingly. PYQ's platform is designed to be easy to use, with pre-built automations for common use cases. It also offers custom automation development services for more complex needs.
Reform
Reform is a modern logistics software development platform that provides pre-built modules and AI capabilities to help teams build logistics applications quickly and efficiently. It offers features such as document AI for automating data capture, universal TMS integrations for seamless connectivity, embeddable customer dashboards for real-time data visibility, and more.
Bricks
Bricks is an AI-first spreadsheet application that simplifies the process of creating and sharing reports, presentations, charts, and visuals using your data. It eliminates the need for advanced spreadsheet expertise, allowing users to effortlessly generate various types of content. Bricks offers a wide range of pre-built templates and tools to enhance productivity and creativity in data analysis and visualization.
Sense Talent Engagement Platform
Sense Talent Engagement Platform is an AI-powered recruitment platform that offers a comprehensive suite of tools to streamline the hiring process. It provides automation workflows, database cleanup, interview scheduling, text messaging, mass texting, WhatsApp and SMS integration, mobile app support, candidate matching, AI chatbot, job matching, scheduling bot, smart FAQ, pre-screening, sourcing, live chat, instant apply, talent CRM, generative AI, voice AI, referrals, analytics, and more. The platform caters to various industries such as financial services, healthcare, logistics, manufacturing, retail, staffing, technology, and more, helping organizations attract, engage, and retain top talent efficiently.
Generrate
Generrate is an AI-powered content creation tool that empowers users to generate high-quality content in their brand voice. It offers a wide range of features including AI Writer, AI Article Wizard, AI Chat, PDF Chat, AI Speech To Text, and AI Voiceover. With Generrate, users can automate their content creation process, customize their content to suit their branding, and interact with AI-powered chatbots for real-time data collection. The tool supports 40 languages, provides unlimited result proposals, and offers 4 levels of creativity. Generrate is a transformative tool for businesses and individuals looking to enhance productivity and efficiency through AI features.
Generated Photos
Generated Photos is an AI-powered platform that offers worry-free model photos through the use of advanced AI-generated faces and full-body human models. Users can access a vast library of pre-generated diverse faces and humans that do not exist in reality. The platform caters to various industries such as advertising, design, marketing, research, and machine learning, providing high-quality and unique images for creative projects. With features like face and human generators, bulk download options, and API integration, Generated Photos simplifies the process of finding and creating custom visual content for different purposes.
SOMA
SOMA is a Research Automation Platform that accelerates medical innovation by providing up to 100x speedup through process automation. The platform analyzes medical research articles, extracts important concepts, and identifies causal and associative relationships between them. It organizes this information into a specialized database forming a knowledge graph. Researchers can retrieve causal chains, access specific research articles, and perform tasks like concept analysis, drug repurposing, and target discovery. SOMA enhances literature review efficiency by finding relevant articles based on causal chains and keywords specified by the user. It empowers researchers to focus on their research by saving up to 95% of the time spent on pre-processing documents. The platform offers freemium access with extended functionality for 14 days and advanced features available through subscription.
RunPod
RunPod is a cloud platform specifically designed for AI development and deployment. It offers a range of features to streamline the process of developing, training, and scaling AI models, including a library of pre-built templates, efficient training pipelines, and scalable deployment options. RunPod also provides access to a wide selection of GPUs, allowing users to choose the optimal hardware for their specific AI workloads.
Weavel
Weavel is an AI tool designed to revolutionize prompt engineering for large language models (LLMs). It offers features such as tracing, dataset curation, batch testing, and evaluations to enhance the performance of LLM applications. Weavel enables users to continuously optimize prompts using real-world data, prevent performance regression with CI/CD integration, and engage in human-in-the-loop interactions for scoring and feedback. Ape, the AI prompt engineer, outperforms competitors on benchmark tests and ensures seamless integration and continuous improvement specific to each user's use case. With Weavel, users can effortlessly evaluate LLM applications without the need for pre-existing datasets, streamlining the assessment process and enhancing overall performance.
Patee.io
Patee.io is an AI-powered platform that helps businesses automate their data annotation and labeling tasks. With Patee.io, businesses can easily create, manage, and annotate large datasets, which can then be used to train machine learning models. Patee.io offers a variety of features that make it easy to annotate data, including a user-friendly interface, a variety of annotation tools, and the ability to collaborate with others. Patee.io also offers a number of pre-built models that can be used to automate the annotation process, saving businesses time and money.
20 - Open Source AI Tools
sparrow
Sparrow is an innovative open-source solution for efficient data extraction and processing from various documents and images. It seamlessly handles forms, invoices, receipts, and other unstructured data sources. Sparrow stands out with its modular architecture, offering independent services and pipelines all optimized for robust performance. One of the critical functionalities of Sparrow - pluggable architecture. You can easily integrate and run data extraction pipelines using tools and frameworks like LlamaIndex, Haystack, or Unstructured. Sparrow enables local LLM data extraction pipelines through Ollama or Apple MLX. With Sparrow solution you get API, which helps to process and transform your data into structured output, ready to be integrated with custom workflows. Sparrow Agents - with Sparrow you can build independent LLM agents, and use API to invoke them from your system. **List of available agents:** * **llamaindex** - RAG pipeline with LlamaIndex for PDF processing * **vllamaindex** - RAG pipeline with LLamaIndex multimodal for image processing * **vprocessor** - RAG pipeline with OCR and LlamaIndex for image processing * **haystack** - RAG pipeline with Haystack for PDF processing * **fcall** - Function call pipeline * **unstructured-light** - RAG pipeline with Unstructured and LangChain, supports PDF and image processing * **unstructured** - RAG pipeline with Weaviate vector DB query, Unstructured and LangChain, supports PDF and image processing * **instructor** - RAG pipeline with Unstructured and Instructor libraries, supports PDF and image processing. Works great for JSON response generation
SmallLanguageModel-project
This repository provides all the necessary items to build a Language Model from scratch, inspired by Karpathy's nanoGPT and Shakespeare generator. It includes data collection tools, data processing scripts, various models like BERT, GPT, and Seq-2-Seq, along with tokenizer and training files.
hi-ml
The Microsoft Health Intelligence Machine Learning Toolbox is a repository that provides low-level and high-level building blocks for Machine Learning / AI researchers and practitioners. It simplifies and streamlines work on deep learning models for healthcare and life sciences by offering tested components such as data loaders, pre-processing tools, deep learning models, and cloud integration utilities. The repository includes two Python packages, 'hi-ml-azure' for helper functions in AzureML, 'hi-ml' for ML components, and 'hi-ml-cpath' for models and workflows related to histopathology images.
upgini
Upgini is an intelligent data search engine with a Python library that helps users find and add relevant features to their ML pipeline from various public, community, and premium external data sources. It automates the optimization of connected data sources by generating an optimal set of machine learning features using large language models, GraphNNs, and recurrent neural networks. The tool aims to simplify feature search and enrichment for external data to make it a standard approach in machine learning pipelines. It democratizes access to data sources for the data science community.
data-juicer
Data-Juicer is a one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs. It is a systematic & reusable library of 80+ core OPs, 20+ reusable config recipes, and 20+ feature-rich dedicated toolkits, designed to function independently of specific LLM datasets and processing pipelines. Data-Juicer allows detailed data analyses with an automated report generation feature for a deeper understanding of your dataset. Coupled with multi-dimension automatic evaluation capabilities, it supports a timely feedback loop at multiple stages in the LLM development process. Data-Juicer offers tens of pre-built data processing recipes for pre-training, fine-tuning, en, zh, and more scenarios. It provides a speedy data processing pipeline requiring less memory and CPU usage, optimized for maximum productivity. Data-Juicer is flexible & extensible, accommodating most types of data formats and allowing flexible combinations of OPs. It is designed for simplicity, with comprehensive documentation, easy start guides and demo configs, and intuitive configuration with simple adding/removing OPs from existing configs.
Train-llm-from-scratch
Train-llm-from-scratch is a repository that guides users through training a Large Language Model (LLM) from scratch. The model size can be adjusted based on available computing power. The repository utilizes deepspeed for distributed training and includes detailed explanations of the code and key steps at each stage to facilitate learning. Users can train their own tokenizer or use pre-trained tokenizers like ChatGLM2-6B. The repository provides information on preparing pre-training data, processing training data, and recommended SFT data for fine-tuning. It also references other projects and books related to LLM training.
nextjs-openai-doc-search
This starter project is designed to process `.mdx` files in the `pages` directory to use as custom context within OpenAI Text Completion prompts. It involves building a custom ChatGPT style doc search powered by Next.js, OpenAI, and Supabase. The project includes steps for pre-processing knowledge base, storing embeddings in Postgres, performing vector similarity search, and injecting content into OpenAI GPT-3 text completion prompt.
NineRec
NineRec is a benchmark dataset suite for evaluating transferable recommendation models. It provides datasets for pre-training and transfer learning in recommender systems, focusing on multimodal and foundation model tasks. The dataset includes user-item interactions, item texts in multiple languages, item URLs, and raw images. Researchers can use NineRec to develop more effective and efficient methods for pre-training recommendation models beyond end-to-end training. The dataset is accompanied by code for dataset preparation, training, and testing in PyTorch environment.
IDvs.MoRec
This repository contains the source code for the SIGIR 2023 paper 'Where to Go Next for Recommender Systems? ID- vs. Modality-based Recommender Models Revisited'. It provides resources for evaluating foundation, transferable, multi-modal, and LLM recommendation models, along with datasets, pre-trained models, and training strategies for IDRec and MoRec using in-batch debiased cross-entropy loss. The repository also offers large-scale datasets, code for SASRec with in-batch debias cross-entropy loss, and information on joining the lab for research opportunities.
mediapipe-rs
MediaPipe-rs is a Rust library designed for MediaPipe tasks on WasmEdge WASI-NN. It offers easy-to-use low-code APIs similar to mediapipe-python, with low overhead and flexibility for custom media input. The library supports various tasks like object detection, image classification, gesture recognition, and more, including TfLite models, TF Hub models, and custom models. Users can create task instances, run sessions for pre-processing, inference, and post-processing, and speed up processing by reusing sessions. The library also provides support for audio tasks using audio data from symphonia, ffmpeg, or raw audio. Users can choose between CPU, GPU, or TPU devices for processing.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.
litdata
LitData is a tool designed for blazingly fast, distributed streaming of training data from any cloud storage. It allows users to transform and optimize data in cloud storage environments efficiently and intuitively, supporting various data types like images, text, video, audio, geo-spatial, and multimodal data. LitData integrates smoothly with frameworks such as LitGPT and PyTorch, enabling seamless streaming of data to multiple machines. Key features include multi-GPU/multi-node support, easy data mixing, pause & resume functionality, support for profiling, memory footprint reduction, cache size configuration, and on-prem optimizations. The tool also provides benchmarks for measuring streaming speed and conversion efficiency, along with runnable templates for different data types. LitData enables infinite cloud data processing by utilizing the Lightning.ai platform to scale data processing with optimized machines.
LL3DA
LL3DA is a Large Language 3D Assistant that responds to both visual and textual interactions within complex 3D environments. It aims to help Large Multimodal Models (LMM) comprehend, reason, and plan in diverse 3D scenes by directly taking point cloud input and responding to textual instructions and visual prompts. LL3DA achieves remarkable results in 3D Dense Captioning and 3D Question Answering, surpassing various 3D vision-language models. The code is fully released, allowing users to train customized models and work with pre-trained weights. The tool supports training with different LLM backends and provides scripts for tuning and evaluating models on various tasks.
chatgpt-universe
ChatGPT is a large language model that can generate human-like text, translate languages, write different kinds of creative content, and answer your questions in a conversational way. It is trained on a massive amount of text data, and it is able to understand and respond to a wide range of natural language prompts. Here are 5 jobs suitable for this tool, in lowercase letters: 1. content writer 2. chatbot assistant 3. language translator 4. creative writer 5. researcher
psychic
Finic is an open source python-based integration platform designed to simplify integration workflows for both business users and developers. It offers a drag-and-drop UI, a dedicated Python environment for each workflow, and generative AI features to streamline transformation tasks. With a focus on decoupling integration from product code, Finic aims to provide faster and more flexible integrations by supporting custom connectors. The tool is open source and allows deployment to users' own cloud environments with minimal legal friction.
LLM-Viewer
LLM-Viewer is a tool for visualizing Language and Learning Models (LLMs) and analyzing performance on different hardware platforms. It enables network-wise analysis, considering factors such as peak memory consumption and total inference time cost. With LLM-Viewer, users can gain valuable insights into LLM inference and performance optimization. The tool can be used in a web browser or as a command line interface (CLI) for easy configuration and visualization. The ongoing project aims to enhance features like showing tensor shapes, expanding hardware platform compatibility, and supporting more LLMs with manual model graph configuration.
finic
Finic is an open source python-based integration platform designed for business users to create v1 integrations with minimal code, while also being flexible for developers to build complex integrations directly in python. It offers a low-code web UI, a dedicated Python environment for each workflow, and generative AI features. Finic decouples integration from product code, supports custom connectors, and is open source. It is not an ETL tool but focuses on integrating functionality between applications via APIs or SFTP, and it is not a workflow automation tool optimized for complex use cases.
dl_model_infer
This project is a c++ version of the AI reasoning library that supports the reasoning of tensorrt models. It provides accelerated deployment cases of deep learning CV popular models and supports dynamic-batch image processing, inference, decode, and NMS. The project has been updated with various models and provides tutorials for model exports. It also includes a producer-consumer inference model for specific tasks. The project directory includes implementations for model inference applications, backend reasoning classes, post-processing, pre-processing, and target detection and tracking. Speed tests have been conducted on various models, and onnx downloads are available for different models.
airflow
Apache Airflow (or simply Airflow) is a platform to programmatically author, schedule, and monitor workflows. When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative. Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.
20 - OpenAI Gpts
Optimisateur de Performance GPT
Expert en optimisation de performance et traitement de données
AVA 1.0 for Aspiring Medics | Medicine Interviews
A trial version of AVA 2.0, the World's first AI Interview Platform for Medical School Interview Interviews - https://ai.theaspiringmedics.co.uk/
Dune x Farcaster (GPT)
A GPT pre-trained on duneSQL Farcaster tables, complex examples, and duneSQL syntax. Please reference the dune docs or contact @shoni.eth for errors
Y Combinator Co-Pilot
Expert in YC applications, pre-trained by real application data insights
Seabiscuit Launch Lander
Startup Strong Within 180 Days: Tailored advice for launching, promoting, and scaling businesses of all types. It covers all stages from pre-launch to post-launch and develops strategies including market research, branding, promotional tactics, and operational planning unique your business. (v1.8)
Travel Buddy
Detailed trip planner offering destination ideas, itineraries, budgeting, and pre-trip advice.
Marina Medical
A guide for aspiring med students, aiding in interviews, MCAT prep, and application advice.
MCAT Mentor
AI MCAT tutor with a focus on challenging topics, structured learning, and supportive guidance.
Afford This Home
I help simplify and realistically calculate how much house you can buy. Real Estate , Personal Finance, Mortgages, & more.
SBA Loan Advisor
Using public SBA data to help you find the best fit lender for your small business
Mobile Home Mortgage Calculator
Get the most up to date information about mortgages for a mobile home