Best AI tools for< Find Data >
20 - AI tool Sites
Keylight AI
Keylight AI is an AI-powered solution designed to help users efficiently find information within their documents. It offers lightning-fast searches, precision accuracy, a user-friendly interface, customizable prompts, and ensures secure and confidential document handling. Ideal for professionals across various industries, Keylight AI revolutionizes document search by providing quick and efficient navigation. Users can boost their productivity and save time with this innovative tool.
Deepfind
Deepfind is a privacy-first AI search engine that prioritizes user data protection. It allows users to conduct searches without the use of cookies, tracking, or storing personal information. Deepfind aims to provide a secure and efficient search experience while maintaining user privacy and data security.
Ocular
Ocular is an AI-powered search platform that allows users to search, visualize, and take action on their work and engineering tools and data on one unified platform. It is designed to help engineers work more efficiently and effectively by providing them with a single, central location to access all of their relevant information.
Phind AI
Phind AI is a cost-effective alternative to other AI search engines, making AI search accessible to everyone, regardless of location. It offers a comprehensive search experience with a user-friendly interface and advanced features.
Qatalog
Qatalog is a business search engine that provides real-time access to data across various company systems and applications. It uses natural language processing and machine learning to understand user queries and deliver relevant results from multiple data sources. Qatalog eliminates the need to search through multiple systems and applications, saving employees time and improving productivity.
Tremello
Tremello is a market research platform that uses AI to deliver off-market data. It combines a leading AI engine with human experts to provide bespoke intelligence delivered directly to the user's inbox. Tremello's AI analyzes relationships, identifies patterns, and considers the broader context, delivering meaningful and actionable insights on top of a base human layer. It leverages a diverse range of data sources, including public and private databases, industry reports, social media archives, company websites, and government filings, ensuring a complete and comprehensive picture of the research subject.
Kira Systems
Kira Systems is a machine learning contract search, review, and analysis software that helps businesses identify, extract, and analyze content in their contracts and documents. It uses patented machine learning technology to extract concepts and data points with high efficiency and accuracy. Kira also has built-in intelligence that streamlines the contract review process with out-of-the-box smart fields. Businesses can also create their own smart fields to find specific data points using Kira's no-code machine learning tool. Kira's adaptive workflows allow businesses to organize, track, and export results. Kira has a partner ecosystem that allows businesses to transform how teams work with their contracts.
Shieldbase
Shieldbase is an AI-powered enterprise search tool designed to provide secure and efficient search capabilities for businesses. It utilizes advanced artificial intelligence algorithms to index and retrieve information from various data sources within an organization, ensuring quick and accurate search results. With a focus on security, Shieldbase offers encryption and access control features to protect sensitive data. The platform is user-friendly and customizable, making it easy for businesses to implement and integrate into their existing systems. Shieldbase enhances productivity by enabling employees to quickly find the information they need, ultimately improving decision-making processes and overall operational efficiency.
The Notion Automation Hub
The Notion Automation Hub is a website that provides pre-built Notion automations and databases to help users save time and improve their productivity. The website offers a variety of automations for different use cases, including job roles, workflows, and tasks. Users can also find pre-built database templates, Notion expert resources, and automation tools. The website is not affiliated with Notion Labs Inc.
Vanga AI
Vanga AI is an AI-powered upselling and cross-selling tool for Shopify stores. It helps businesses increase their revenue by automatically generating and displaying upsells and cross-sells on their post-purchase and thank you pages. Vanga AI uses data to find the products that customers are most likely to buy together, and it creates custom upsell funnels for each product. The tool is easy to use and requires no setup or maintenance. Vanga AI offers a 14-day free trial and two paid plans, starting at $9/month.
Word Changer
Word Changer is an online tool that helps you rewrite and enhance your writing. It analyzes your content and provides suggested alternative words and phrases to improve your work. It then references a vast database to find creative new ways to express the same ideas. The substituted words fit easily into the context, so the meaning does not change. The suggestions appear highlighted within your text. You can easily accept them with one click or ignore ones that don't quite fit. It's like having an editor look over your shoulder and provide real-time feedback as you write!
Find My Remote
Find My Remote is an AI-powered job search platform that streamlines the job hunting process by leveraging artificial intelligence to find and structure job postings from various ATS platforms. Users can set their job preferences, receive personalized job matches, and save time by applying to curated job listings. The platform offers exclusive job opportunities not typically found on popular job search websites like LinkedIn. With features such as job discovery, application tracking, and faster application process, Find My Remote aims to revolutionize the way job seekers find and apply for jobs.
Datafitai
Datafitai is a community platform for ChatGPT prompting, where users can find and share top-rated prompts for various topics such as marketing, coding, finance, writing, gaming, and art. The platform aims to improve the accuracy of ChatGPT responses by providing high-quality prompts. Users can explore, share, and contribute to a wide range of prompts to enhance their ChatGPT experience.
Jobs-Scout
Jobs-Scout is an AI-powered job search engine that helps you find your dream job. With Jobs-Scout, you can search for jobs by keyword, location, and industry. You can also filter your search results by salary, experience, and education level. Jobs-Scout also provides personalized job recommendations based on your skills and interests.
Picarta AI
Picarta AI is an image geolocalization solution that uses artificial intelligence to find where a photo has been taken in the world. By uploading a photo, users can get the GPS location, latitude, longitude, time stamp, and camera details of the image. Picarta AI also offers a map view of the image location and allows users to download the map. The company's vision is to empower individuals and businesses with the most accurate and reliable image geolocalization solution, unlocking new possibilities for exploration, research, and decision-making.
Find AI
Find AI is an AI-powered search engine that provides users with advanced search capabilities to unlock contact details and gain more accurate insights. The platform caters to individuals and companies looking to research people, companies, startups, founders, and more. Users can access email addresses and premium search features to explore a wide range of data related to various industries and sectors. Find AI offers a user-friendly interface and efficient search algorithms to deliver relevant results in a timely manner.
What's The Big Data
What's The Big Data is an AI tool directory that helps users unleash their potential by providing a comprehensive source for AI tools, data, and ChatGPT. The platform is updated daily and caters to every need, offering a wide range of AI assistants across various categories. Users can easily find their perfect AI assistant with just a click, making it a valuable resource for those seeking AI solutions.
Talentscreener
Talentscreener is an AI-powered talent assessment platform that helps businesses find the best candidates for their open positions. The platform uses a variety of AI algorithms to assess candidates' skills, experience, and personality, and then provides businesses with a ranked list of the most qualified candidates. Talentscreener also offers a variety of other features, such as job posting, candidate management, and reporting.
Inven
Inven is an AI-powered company data platform that helps professionals in private equity, investment banking, business brokerage, consulting, and corporate development find companies faster and more efficiently. With Inven, users can access a database of over 23 million companies and 430 million contacts in over 160 countries. Inven's AI algorithms and NLP solutions analyze millions of data points from a wide range of sources to give users actionable insights on any niche.
Archistar
Archistar is a leading property research platform in Australia that empowers users to make confident and compliant property decisions with the help of data and AI. It offers a range of features, including the ability to find and assess properties, generate 3D design concepts, and minimize risk and maximize return on investment. Archistar is trusted by over 100,000 individuals and 1,000 leading property firms.
20 - Open Source AI Tools
data-juicer
Data-Juicer is a one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs. It is a systematic & reusable library of 80+ core OPs, 20+ reusable config recipes, and 20+ feature-rich dedicated toolkits, designed to function independently of specific LLM datasets and processing pipelines. Data-Juicer allows detailed data analyses with an automated report generation feature for a deeper understanding of your dataset. Coupled with multi-dimension automatic evaluation capabilities, it supports a timely feedback loop at multiple stages in the LLM development process. Data-Juicer offers tens of pre-built data processing recipes for pre-training, fine-tuning, en, zh, and more scenarios. It provides a speedy data processing pipeline requiring less memory and CPU usage, optimized for maximum productivity. Data-Juicer is flexible & extensible, accommodating most types of data formats and allowing flexible combinations of OPs. It is designed for simplicity, with comprehensive documentation, easy start guides and demo configs, and intuitive configuration with simple adding/removing OPs from existing configs.
cleanlab
Cleanlab helps you **clean** data and **lab** els by automatically detecting issues in a ML dataset. To facilitate **machine learning with messy, real-world data** , this data-centric AI package uses your _existing_ models to estimate dataset problems that can be fixed to train even _better_ models.
txtai
Txtai is an all-in-one embeddings database for semantic search, LLM orchestration, and language model workflows. It combines vector indexes, graph networks, and relational databases to enable vector search with SQL, topic modeling, retrieval augmented generation, and more. Txtai can stand alone or serve as a knowledge source for large language models (LLMs). Key features include vector search with SQL, object storage, topic modeling, graph analysis, multimodal indexing, embedding creation for various data types, pipelines powered by language models, workflows to connect pipelines, and support for Python, JavaScript, Java, Rust, and Go. Txtai is open-source under the Apache 2.0 license.
awesome-llms-fine-tuning
This repository is a curated collection of resources for fine-tuning Large Language Models (LLMs) like GPT, BERT, RoBERTa, and their variants. It includes tutorials, papers, tools, frameworks, and best practices to aid researchers, data scientists, and machine learning practitioners in adapting pre-trained models to specific tasks and domains. The resources cover a wide range of topics related to fine-tuning LLMs, providing valuable insights and guidelines to streamline the process and enhance model performance.
SQLAgent
DataAgent is a multi-agent system for data analysis, capable of understanding data development and data analysis requirements, understanding data, and generating SQL and Python code for tasks such as data query, data visualization, and machine learning.
airda
airda(Air Data Agent) is a multi-agent system for data analysis, which can understand data development and data analysis requirements, understand data, and generate SQL and Python code for data query, data visualization, machine learning and other tasks.
upgini
Upgini is an intelligent data search engine with a Python library that helps users find and add relevant features to their ML pipeline from various public, community, and premium external data sources. It automates the optimization of connected data sources by generating an optimal set of machine learning features using large language models, GraphNNs, and recurrent neural networks. The tool aims to simplify feature search and enrichment for external data to make it a standard approach in machine learning pipelines. It democratizes access to data sources for the data science community.
chat-with-your-data-solution-accelerator
Chat with your data using OpenAI and AI Search. This solution accelerator uses an Azure OpenAI GPT model and an Azure AI Search index generated from your data, which is integrated into a web application to provide a natural language interface, including speech-to-text functionality, for search queries. Users can drag and drop files, point to storage, and take care of technical setup to transform documents. There is a web app that users can create in their own subscription with security and authentication.
aimo-progress-prize
This repository contains the training and inference code needed to replicate the winning solution to the AI Mathematical Olympiad - Progress Prize 1. It consists of fine-tuning DeepSeekMath-Base 7B, high-quality training datasets, a self-consistency decoding algorithm, and carefully chosen validation sets. The training methodology involves Chain of Thought (CoT) and Tool Integrated Reasoning (TIR) training stages. Two datasets, NuminaMath-CoT and NuminaMath-TIR, were used to fine-tune the models. The models were trained using open-source libraries like TRL, PyTorch, vLLM, and DeepSpeed. Post-training quantization to 8-bit precision was done to improve performance on Kaggle's T4 GPUs. The project structure includes scripts for training, quantization, and inference, along with necessary installation instructions and hardware/software specifications.
SQL-AI-samples
This repository contains samples to help design AI applications using data from an Azure SQL Database. It showcases technical concepts and workflows integrating Azure SQL data with popular AI components both within and outside Azure. The samples cover various AI features such as Azure Cognitive Services, Promptflow, OpenAI, Vanna.AI, Content Moderation, LangChain, and more. Additionally, there are end-to-end samples like Similar Content Finder, Session Conference Assistant, Chatbots, Vectorization, SQL Server Database Development, Redis Vector Search, and Similarity Search with FAISS.
SoM-LLaVA
SoM-LLaVA is a new data source and learning paradigm for Multimodal LLMs, empowering open-source Multimodal LLMs with Set-of-Mark prompting and improved visual reasoning ability. The repository provides a new dataset that is complementary to existing training sources, enhancing multimodal LLMs with Set-of-Mark prompting and improved general capacity. By adding 30k SoM data to the visual instruction tuning stage of LLaVA, the tool achieves 1% to 6% relative improvements on all benchmarks. Users can train SoM-LLaVA via command line and utilize the implementation to annotate COCO images with SoM. Additionally, the tool can be loaded in Huggingface for further usage.
qlib
Qlib is an open-source, AI-oriented quantitative investment platform that supports diverse machine learning modeling paradigms, including supervised learning, market dynamics modeling, and reinforcement learning. It covers the entire chain of quantitative investment, from alpha seeking to order execution. The platform empowers researchers to explore ideas and implement productions using AI technologies in quantitative investment. Qlib collaboratively solves key challenges in quantitative investment by releasing state-of-the-art research works in various paradigms. It provides a full ML pipeline for data processing, model training, and back-testing, enabling users to perform tasks such as forecasting market patterns, adapting to market dynamics, and modeling continuous investment decisions.
holmesgpt
HolmesGPT is an open-source DevOps assistant powered by OpenAI or any tool-calling LLM of your choice. It helps in troubleshooting Kubernetes, incident response, ticket management, automated investigation, and runbook automation in plain English. The tool connects to existing observability data, is compliance-friendly, provides transparent results, supports extensible data sources, runbook automation, and integrates with existing workflows. Users can install HolmesGPT using Brew, prebuilt Docker container, Python Poetry, or Docker. The tool requires an API key for functioning and supports OpenAI, Azure AI, and self-hosted LLMs.
llm-course
The LLM course is divided into three parts: 1. 🧩 **LLM Fundamentals** covers essential knowledge about mathematics, Python, and neural networks. 2. 🧑🔬 **The LLM Scientist** focuses on building the best possible LLMs using the latest techniques. 3. 👷 **The LLM Engineer** focuses on creating LLM-based applications and deploying them. For an interactive version of this course, I created two **LLM assistants** that will answer questions and test your knowledge in a personalized way: * 🤗 **HuggingChat Assistant**: Free version using Mixtral-8x7B. * 🤖 **ChatGPT Assistant**: Requires a premium account. ## 📝 Notebooks A list of notebooks and articles related to large language models. ### Tools | Notebook | Description | Notebook | |----------|-------------|----------| | 🧐 LLM AutoEval | Automatically evaluate your LLMs using RunPod | ![Open In Colab](img/colab.svg) | | 🥱 LazyMergekit | Easily merge models using MergeKit in one click. | ![Open In Colab](img/colab.svg) | | 🦎 LazyAxolotl | Fine-tune models in the cloud using Axolotl in one click. | ![Open In Colab](img/colab.svg) | | ⚡ AutoQuant | Quantize LLMs in GGUF, GPTQ, EXL2, AWQ, and HQQ formats in one click. | ![Open In Colab](img/colab.svg) | | 🌳 Model Family Tree | Visualize the family tree of merged models. | ![Open In Colab](img/colab.svg) | | 🚀 ZeroSpace | Automatically create a Gradio chat interface using a free ZeroGPU. | ![Open In Colab](img/colab.svg) |
LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing
LLM-PowerHouse is a comprehensive and curated guide designed to empower developers, researchers, and enthusiasts to harness the true capabilities of Large Language Models (LLMs) and build intelligent applications that push the boundaries of natural language understanding. This GitHub repository provides in-depth articles, codebase mastery, LLM PlayLab, and resources for cost analysis and network visualization. It covers various aspects of LLMs, including NLP, models, training, evaluation metrics, open LLMs, and more. The repository also includes a collection of code examples and tutorials to help users build and deploy LLM-based applications.
llm-datasets
LLM Datasets is a repository containing high-quality datasets, tools, and concepts for LLM fine-tuning. It provides datasets with characteristics like accuracy, diversity, and complexity to train large language models for various tasks. The repository includes datasets for general-purpose, math & logic, code, conversation & role-play, and agent & function calling domains. It also offers guidance on creating high-quality datasets through data deduplication, data quality assessment, data exploration, and data generation techniques.
linkedin-api
The Linkedin API for Python allows users to programmatically search profiles, send messages, and find jobs using a regular Linkedin user account. It does not require 'official' API access, just a valid Linkedin account. However, it is important to note that this library is not officially supported by LinkedIn and using it may violate LinkedIn's Terms of Service. Users can authenticate using any Linkedin account credentials and access features like getting profiles, profile contact info, and connections. The library also provides commercial alternatives for extracting data, scraping public profiles, and accessing a full LinkedIn API. It is not endorsed or supported by LinkedIn and is intended for educational purposes and personal use only.
x-crawl
x-crawl is a flexible Node.js AI-assisted crawler library that offers powerful AI assistance functions to make crawler work more efficient, intelligent, and convenient. It consists of a crawler API and various functions that can work normally even without relying on AI. The AI component is currently based on a large AI model provided by OpenAI, simplifying many tedious operations. The library supports crawling dynamic pages, static pages, interface data, and file data, with features like control page operations, device fingerprinting, asynchronous sync, interval crawling, failed retry handling, rotation proxy, priority queue, crawl information control, and TypeScript support.
langfuse
Langfuse is a powerful tool that helps you develop, monitor, and test your LLM applications. With Langfuse, you can: * **Develop:** Instrument your app and start ingesting traces to Langfuse, inspect and debug complex logs, and manage, version, and deploy prompts from within Langfuse. * **Monitor:** Track metrics (cost, latency, quality) and gain insights from dashboards & data exports, collect and calculate scores for your LLM completions, run model-based evaluations, collect user feedback, and manually score observations in Langfuse. * **Test:** Track and test app behaviour before deploying a new version, test expected in and output pairs and benchmark performance before deploying, and track versions and releases in your application. Langfuse is easy to get started with and offers a generous free tier. You can sign up for Langfuse Cloud or deploy Langfuse locally or on your own infrastructure. Langfuse also offers a variety of integrations to make it easy to connect to your LLM applications.
20 - OpenAI Gpts
OpenData Explorer
I'll help you access and understand open data published by central government, local authorities and public bodies. You can ask me in your native language.
Chronic Disease Indicators Expert
This chatbot answers questions about the CDC’s Chronic Disease Indicators dataset
ResourceFinder
Assists in identifying and utilizing APIs and files effectively to enhance user-designed GPTs.
Sommelier de dados
Opa! Cole o texto da sua reportagem ou trecho para que eu possa analisá-la com base em manuais de uso de dados em textos jornalísticos.
PPT Expert
PPT Assistant for creating detailed outlines in Markdown, using Chinese by default.
AI OSINT
Your AI OSINT assistant. Our tool helps you find the data needle in the internet haystack.
Open Data Italia bot
Fornisce informazioni sulla normativa italiana in materia di open data, con un tono professionale e divulgativo. In modo che sia più facile chiederne e/o pretenderne la pubblicazione.
BCorpGPT
Query BCorp company data. All data is publicly available. United Kingdom only (for now).