Best AI tools for< Find Data >
20 - AI tool Sites

Landbase
Landbase is an AI-powered platform that offers competitive Go-To-Market (GTM) insights for businesses. Users can search for any company and receive AI-powered overviews within seconds. The platform provides tools, guides, and more to help businesses prepare for their market strategies. Landbase offers features such as sentiment analysis, digital trust scoring, omnichannel lead generation, and automation of sales pipelines. With Landbase, users can unlock key data points and receive AI-powered analysis for their organizations, enabling them to focus on generating high-ticket leads and delivering exceptional service to customers.

Keylight AI
Keylight AI is an AI-powered solution designed to help users efficiently find information within their documents. It offers lightning-fast searches, precision accuracy, a user-friendly interface, customizable prompts, and ensures secure and confidential document handling. Ideal for professionals across various industries, Keylight AI revolutionizes document search by providing quick and efficient navigation. Users can boost their productivity and save time with this innovative tool.

Deepfind
Deepfind is a privacy-first AI search engine that prioritizes user data protection. It allows users to conduct searches without the use of cookies, tracking, or storing personal information. Deepfind aims to provide a secure and efficient search experience while maintaining user privacy and data security.

Ocular
Ocular is an AI-powered search platform that allows users to search, visualize, and take action on their work and engineering tools and data on one unified platform. It is designed to help engineers work more efficiently and effectively by providing them with a single, central location to access all of their relevant information.

Phind AI
Phind AI is a cost-effective alternative to other AI search engines, making AI search accessible to everyone, regardless of location. It offers a comprehensive search experience with a user-friendly interface and advanced features.

Qatalog
Qatalog is a business search engine that provides real-time access to data across various company systems and applications. It uses natural language processing and machine learning to understand user queries and deliver relevant results from multiple data sources. Qatalog eliminates the need to search through multiple systems and applications, saving employees time and improving productivity.

Tremello
Tremello is a market research platform that uses AI to deliver off-market data. It combines a leading AI engine with human experts to provide bespoke intelligence delivered directly to the user's inbox. Tremello's AI analyzes relationships, identifies patterns, and considers the broader context, delivering meaningful and actionable insights on top of a base human layer. It leverages a diverse range of data sources, including public and private databases, industry reports, social media archives, company websites, and government filings, ensuring a complete and comprehensive picture of the research subject.

Robin AI
Robin AI is a legal AI application that offers a platform for accelerating contract review and analysis. It provides services such as generating contract reports 50 times faster, reviewing contracts 80% faster, and finding contract data in less than 3 seconds. The application combines LLMs, proprietary machine learning models, and legal experts to transform contract review for businesses worldwide. With features like precision edits, secure repository, fast turnaround times, and customizable report templates, Robin AI aims to simplify contract processes for legal teams. The platform also offers resources like blog insights, webinars, and legal dictionary definitions to empower users in the legal industry.

Kira Systems
Kira Systems is a machine learning contract search, review, and analysis software that helps businesses identify, extract, and analyze content in their contracts and documents. It uses patented machine learning technology to extract concepts and data points with high efficiency and accuracy. Kira also has built-in intelligence that streamlines the contract review process with out-of-the-box smart fields. Businesses can also create their own smart fields to find specific data points using Kira's no-code machine learning tool. Kira's adaptive workflows allow businesses to organize, track, and export results. Kira has a partner ecosystem that allows businesses to transform how teams work with their contracts.

Shieldbase
Shieldbase is an AI-powered enterprise search tool designed to provide secure and efficient search capabilities for businesses. It utilizes advanced artificial intelligence algorithms to index and retrieve information from various data sources within an organization, ensuring quick and accurate search results. With a focus on security, Shieldbase offers encryption and access control features to protect sensitive data. The platform is user-friendly and customizable, making it easy for businesses to implement and integrate into their existing systems. Shieldbase enhances productivity by enabling employees to quickly find the information they need, ultimately improving decision-making processes and overall operational efficiency.

Elastic
Elastic is a Search AI Company that offers a platform for building tailored experiences, search and analytics, data ingestion, visualization, and generative AI solutions. The company provides services like Elastic Cloud for real-time insights, Elastic AI Assistant for retrieval and generation, and Search AI Lake for faster integration with LLMs. Elastic aims to help businesses scale with low-latency search AI and accelerate problem resolution with observability powered by advanced ML and analytics.

The Notion Automation Hub
The Notion Automation Hub is a website that provides pre-built Notion automations and databases to help users save time and improve their productivity. The website offers a variety of automations for different use cases, including job roles, workflows, and tasks. Users can also find pre-built database templates, Notion expert resources, and automation tools. The website is not affiliated with Notion Labs Inc.

Vanga AI
Vanga AI is an AI-powered upselling and cross-selling tool for Shopify stores. It helps businesses increase their revenue by automatically generating and displaying upsells and cross-sells on their post-purchase and thank you pages. Vanga AI uses data to find the products that customers are most likely to buy together, and it creates custom upsell funnels for each product. The tool is easy to use and requires no setup or maintenance. Vanga AI offers a 14-day free trial and two paid plans, starting at $9/month.

Word Changer
Word Changer is an online tool that helps you rewrite and enhance your writing. It analyzes your content and provides suggested alternative words and phrases to improve your work. It then references a vast database to find creative new ways to express the same ideas. The substituted words fit easily into the context, so the meaning does not change. The suggestions appear highlighted within your text. You can easily accept them with one click or ignore ones that don't quite fit. It's like having an editor look over your shoulder and provide real-time feedback as you write!

Find My Remote
Find My Remote is an AI-powered job search platform that streamlines the job hunting process by leveraging artificial intelligence to find and structure job postings from various ATS platforms. Users can set their job preferences, receive personalized job matches, and save time by applying to curated job listings. The platform offers exclusive job opportunities not typically found on popular job search websites like LinkedIn. With features such as job discovery, application tracking, and faster application process, Find My Remote aims to revolutionize the way job seekers find and apply for jobs.

Datafitai
Datafitai is a community platform for ChatGPT prompting, where users can find and share top-rated prompts for various topics such as marketing, coding, finance, writing, gaming, and art. The platform aims to improve the accuracy of ChatGPT responses by providing high-quality prompts. Users can explore, share, and contribute to a wide range of prompts to enhance their ChatGPT experience.

Jobs-Scout
Jobs-Scout is an AI-powered job search engine that helps you find your dream job. With Jobs-Scout, you can search for jobs by keyword, location, and industry. You can also filter your search results by salary, experience, and education level. Jobs-Scout also provides personalized job recommendations based on your skills and interests.

Picarta AI
Picarta AI is an image geolocalization solution that uses artificial intelligence to find where a photo has been taken in the world. By uploading a photo, users can get the GPS location, latitude, longitude, time stamp, and camera details of the image. Picarta AI also offers a map view of the image location and allows users to download the map. The company's vision is to empower individuals and businesses with the most accurate and reliable image geolocalization solution, unlocking new possibilities for exploration, research, and decision-making.

Find AI
Find AI is an AI-powered search engine that provides users with advanced search capabilities to unlock contact details and gain more accurate insights. The platform caters to individuals and companies looking to research people, companies, startups, founders, and more. Users can access email addresses and premium search features to explore a wide range of data related to various industries and sectors. Find AI offers a user-friendly interface and efficient search algorithms to deliver relevant results in a timely manner.

Linkeddit
Linkeddit is an AI-powered tool designed to help users find potential customers on Reddit who are actively seeking solutions. By analyzing millions of conversations in real-time, Linkeddit identifies high-intent prospects discussing relevant product categories. The tool provides curated lists of decision-makers with verified buying intent, engagement metrics, and context to help convert warm leads into customers. Linkeddit also offers features like direct post links, engagement metrics, buying intent score, export-ready lists, and personalized outreach suggestions, enabling users to efficiently connect with the right audience on Reddit.
20 - Open Source AI Tools

data-juicer
Data-Juicer is a one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs. It is a systematic & reusable library of 80+ core OPs, 20+ reusable config recipes, and 20+ feature-rich dedicated toolkits, designed to function independently of specific LLM datasets and processing pipelines. Data-Juicer allows detailed data analyses with an automated report generation feature for a deeper understanding of your dataset. Coupled with multi-dimension automatic evaluation capabilities, it supports a timely feedback loop at multiple stages in the LLM development process. Data-Juicer offers tens of pre-built data processing recipes for pre-training, fine-tuning, en, zh, and more scenarios. It provides a speedy data processing pipeline requiring less memory and CPU usage, optimized for maximum productivity. Data-Juicer is flexible & extensible, accommodating most types of data formats and allowing flexible combinations of OPs. It is designed for simplicity, with comprehensive documentation, easy start guides and demo configs, and intuitive configuration with simple adding/removing OPs from existing configs.

cleanlab
Cleanlab helps you **clean** data and **lab** els by automatically detecting issues in a ML dataset. To facilitate **machine learning with messy, real-world data** , this data-centric AI package uses your _existing_ models to estimate dataset problems that can be fixed to train even _better_ models.

llmap
LLMap is a CLI code search tool designed to automatically find context in large codebases by evaluating the relevance of each source file using DeepSeek-V3 and DeepSeek-R1. It optimizes analysis by performing multi-stage analysis and caching results for faster searches. Currently supports Java and Python files, with potential for extension to other languages. Install with 'pip install llmap-ai' and use with a DeepSeek API key to search for specific context in code.

txtai
Txtai is an all-in-one embeddings database for semantic search, LLM orchestration, and language model workflows. It combines vector indexes, graph networks, and relational databases to enable vector search with SQL, topic modeling, retrieval augmented generation, and more. Txtai can stand alone or serve as a knowledge source for large language models (LLMs). Key features include vector search with SQL, object storage, topic modeling, graph analysis, multimodal indexing, embedding creation for various data types, pipelines powered by language models, workflows to connect pipelines, and support for Python, JavaScript, Java, Rust, and Go. Txtai is open-source under the Apache 2.0 license.

awesome-llms-fine-tuning
This repository is a curated collection of resources for fine-tuning Large Language Models (LLMs) like GPT, BERT, RoBERTa, and their variants. It includes tutorials, papers, tools, frameworks, and best practices to aid researchers, data scientists, and machine learning practitioners in adapting pre-trained models to specific tasks and domains. The resources cover a wide range of topics related to fine-tuning LLMs, providing valuable insights and guidelines to streamline the process and enhance model performance.

SQLAgent
DataAgent is a multi-agent system for data analysis, capable of understanding data development and data analysis requirements, understanding data, and generating SQL and Python code for tasks such as data query, data visualization, and machine learning.

airda
airda(Air Data Agent) is a multi-agent system for data analysis, which can understand data development and data analysis requirements, understand data, and generate SQL and Python code for data query, data visualization, machine learning and other tasks.

upgini
Upgini is an intelligent data search engine with a Python library that helps users find and add relevant features to their ML pipeline from various public, community, and premium external data sources. It automates the optimization of connected data sources by generating an optimal set of machine learning features using large language models, GraphNNs, and recurrent neural networks. The tool aims to simplify feature search and enrichment for external data to make it a standard approach in machine learning pipelines. It democratizes access to data sources for the data science community.

chat-with-your-data-solution-accelerator
Chat with your data using OpenAI and AI Search. This solution accelerator uses an Azure OpenAI GPT model and an Azure AI Search index generated from your data, which is integrated into a web application to provide a natural language interface, including speech-to-text functionality, for search queries. Users can drag and drop files, point to storage, and take care of technical setup to transform documents. There is a web app that users can create in their own subscription with security and authentication.

orama-core
OramaCore is a database designed for AI projects, answer engines, copilots, and search functionalities. It offers features such as a full-text search engine, vector database, LLM interface, and various utilities. The tool is currently under active development and not recommended for production use due to potential API changes. OramaCore aims to provide a comprehensive solution for managing data and enabling advanced AI capabilities in projects.

aimo-progress-prize
This repository contains the training and inference code needed to replicate the winning solution to the AI Mathematical Olympiad - Progress Prize 1. It consists of fine-tuning DeepSeekMath-Base 7B, high-quality training datasets, a self-consistency decoding algorithm, and carefully chosen validation sets. The training methodology involves Chain of Thought (CoT) and Tool Integrated Reasoning (TIR) training stages. Two datasets, NuminaMath-CoT and NuminaMath-TIR, were used to fine-tune the models. The models were trained using open-source libraries like TRL, PyTorch, vLLM, and DeepSpeed. Post-training quantization to 8-bit precision was done to improve performance on Kaggle's T4 GPUs. The project structure includes scripts for training, quantization, and inference, along with necessary installation instructions and hardware/software specifications.

SQL-AI-samples
This repository contains samples to help design AI applications using data from an Azure SQL Database. It showcases technical concepts and workflows integrating Azure SQL data with popular AI components both within and outside Azure. The samples cover various AI features such as Azure Cognitive Services, Promptflow, OpenAI, Vanna.AI, Content Moderation, LangChain, and more. Additionally, there are end-to-end samples like Similar Content Finder, Session Conference Assistant, Chatbots, Vectorization, SQL Server Database Development, Redis Vector Search, and Similarity Search with FAISS.

SoM-LLaVA
SoM-LLaVA is a new data source and learning paradigm for Multimodal LLMs, empowering open-source Multimodal LLMs with Set-of-Mark prompting and improved visual reasoning ability. The repository provides a new dataset that is complementary to existing training sources, enhancing multimodal LLMs with Set-of-Mark prompting and improved general capacity. By adding 30k SoM data to the visual instruction tuning stage of LLaVA, the tool achieves 1% to 6% relative improvements on all benchmarks. Users can train SoM-LLaVA via command line and utilize the implementation to annotate COCO images with SoM. Additionally, the tool can be loaded in Huggingface for further usage.

qlib
Qlib is an open-source, AI-oriented quantitative investment platform that supports diverse machine learning modeling paradigms, including supervised learning, market dynamics modeling, and reinforcement learning. It covers the entire chain of quantitative investment, from alpha seeking to order execution. The platform empowers researchers to explore ideas and implement productions using AI technologies in quantitative investment. Qlib collaboratively solves key challenges in quantitative investment by releasing state-of-the-art research works in various paradigms. It provides a full ML pipeline for data processing, model training, and back-testing, enabling users to perform tasks such as forecasting market patterns, adapting to market dynamics, and modeling continuous investment decisions.

observers
Observers is a lightweight library for AI observability that provides support for various generative AI APIs and storage backends. It allows users to track interactions with AI models and sync observations to different storage systems. The library supports OpenAI, Hugging Face transformers, AISuite, Litellm, and Docling for document parsing and export. Users can configure different stores such as Hugging Face Datasets, DuckDB, Argilla, and OpenTelemetry to manage and query their observations. Observers is designed to enhance AI model monitoring and observability in a user-friendly manner.

opencompass
OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features include: * Comprehensive support for models and datasets: Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 70+ datasets with about 400,000 questions, comprehensively evaluating the capabilities of the models in five dimensions. * Efficient distributed evaluation: One line command to implement task division and distributed evaluation, completing the full evaluation of billion-scale models in just a few hours. * Diversified evaluation paradigms: Support for zero-shot, few-shot, and chain-of-thought evaluations, combined with standard or dialogue-type prompt templates, to easily stimulate the maximum performance of various models. * Modular design with high extensibility: Want to add new models or datasets, customize an advanced task division strategy, or even support a new cluster management system? Everything about OpenCompass can be easily expanded! * Experiment management and reporting mechanism: Use config files to fully record each experiment, and support real-time reporting of results.

LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing
LLM-PowerHouse is a comprehensive and curated guide designed to empower developers, researchers, and enthusiasts to harness the true capabilities of Large Language Models (LLMs) and build intelligent applications that push the boundaries of natural language understanding. This GitHub repository provides in-depth articles, codebase mastery, LLM PlayLab, and resources for cost analysis and network visualization. It covers various aspects of LLMs, including NLP, models, training, evaluation metrics, open LLMs, and more. The repository also includes a collection of code examples and tutorials to help users build and deploy LLM-based applications.

mergekit
Mergekit is a toolkit for merging pre-trained language models. It uses an out-of-core approach to perform unreasonably elaborate merges in resource-constrained situations. Merges can be run entirely on CPU or accelerated with as little as 8 GB of VRAM. Many merging algorithms are supported, with more coming as they catch my attention.

crewAI
CrewAI is a cutting-edge framework designed to orchestrate role-playing autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks. It enables AI agents to assume roles, share goals, and operate in a cohesive unit, much like a well-oiled crew. Whether you're building a smart assistant platform, an automated customer service ensemble, or a multi-agent research team, CrewAI provides the backbone for sophisticated multi-agent interactions. With features like role-based agent design, autonomous inter-agent delegation, flexible task management, and support for various LLMs, CrewAI offers a dynamic and adaptable solution for both development and production workflows.

AnkiAIUtils
Anki AI Utils is a powerful suite of AI-powered tools designed to enhance your Anki flashcard learning experience by automatically improving cards you struggle with. The tools include features such as adaptive learning, personalized memory hooks, automation readiness, universal compatibility, provider agnosticism, and infinite extensibility. The toolkit consists of tools like Illustrator for creating custom mnemonic images, Reformulator for rephrasing flashcards, Mnemonics Creator for generating memorable mnemonics, Explainer for providing detailed explanations, and Mnemonics Helper for quick mnemonic generation. The project aims to motivate others to package the tools into addons for wider accessibility.
20 - OpenAI Gpts

OpenData Explorer
I'll help you access and understand open data published by central government, local authorities and public bodies. You can ask me in your native language.

Chronic Disease Indicators Expert
This chatbot answers questions about the CDC’s Chronic Disease Indicators dataset

ResourceFinder
Assists in identifying and utilizing APIs and files effectively to enhance user-designed GPTs.

Sommelier de dados
Opa! Cole o texto da sua reportagem ou trecho para que eu possa analisá-la com base em manuais de uso de dados em textos jornalísticos.

PPT Expert
PPT Assistant for creating detailed outlines in Markdown, using Chinese by default.

AI OSINT
Your AI OSINT assistant. Our tool helps you find the data needle in the internet haystack.
Open Data Italia bot
Fornisce informazioni sulla normativa italiana in materia di open data, con un tono professionale e divulgativo. In modo che sia più facile chiederne e/o pretenderne la pubblicazione.

BCorpGPT
Query BCorp company data. All data is publicly available. United Kingdom only (for now).