Best AI tools for< Find Data Issues >
20 - AI tool Sites
Ocular
Ocular is an AI-powered search platform that allows users to search, visualize, and take action on their work and engineering tools and data on one unified platform. It is designed to help engineers work more efficiently and effectively by providing them with a single, central location to access all of their relevant information.
Imagetwin
Imagetwin is an AI-based software designed to detect integrity issues in figures of scientific articles, particularly in the life science field. It offers efficient and accurate detection of inappropriate manipulation, duplication, and plagiarism in various types of figures such as western blots, microscopy images, and light photography. The software is a valuable addition to the peer-review process, automatically detecting integrity issues and providing quick verification by reviewers while ensuring data privacy and security.
HypeAuditor
HypeAuditor is a 100% AI-Powered Influencer Marketing Platform that offers a comprehensive suite of tools for influencer marketing campaigns. With a database of over 168.7 million influencers, HypeAuditor provides features such as influencer discovery, campaign management, market analysis, and influencer analytics. The platform helps brands and agencies grow their business by increasing efficiency and control over influencer marketing strategies. HypeAuditor's AI technology enables users to analyze influencer audiences, performance, and detect fraud issues, providing valuable insights for successful influencer collaborations.
Sommify
Sommify is an AI sommelier application designed to help companies sell wine by creating memorable experiences for customers. The application addresses common issues in the wine industry such as customers' preferences, lack of information, and hesitation to ask questions. Sommify leverages AI technology and data analysis to automate wine pairing, generate valuable insights, and assist customers in finding the perfect wine match. Trusted by industry leaders and backed by investors, Sommify aims to revolutionize the wine purchasing experience through personalized recommendations and tailored solutions.
Atlassian Intelligence
Atlassian Intelligence is an AI-powered tool that accelerates productivity on the Atlassian platform by transforming teamwork through AI-human collaboration. It provides insights from team knowledge, turns data into actionable insights, helps find issues in Jira using natural language, offers development insights, accelerates incident detection, and assists in work acceleration and project management. The tool also enables quick responses to customer requests, drives faster decision-making, delivers faster service through virtual agents, streamlines setup, and automates tedious tasks.
Mastertech.ai
Mastertech.ai is an AI tool designed to assist with manufacturer procedures and diagnose common issues for any vehicle. By providing personalized assistance based on the user's level of experience, it offers instant answers for questions related to torque specs, fluid capacity, component locations, and more. The tool cross-references symptoms against Technical Service Bulletins to identify common known issues and provides trustworthy vehicle data. With upcoming features like integration with shop management platforms and an interactive diagnostic AI assistant, Mastertech.ai aims to enhance accuracy and efficiency in automotive repair.
unSkript
unSkript is an AI-powered infrastructure health intelligence tool designed to ensure the health of your application infrastructure. It uses Generative AI and Intelligent Health Checks to proactively find, diagnose, and fix issues in your application infrastructure. With features like Proactive Health Checks, Generative AI based RCA, and Continuous Learning, unSkript helps streamline processes for cloud-operations teams and software teams. By leveraging AI technology, unSkript aims to minimize downtime, deliver real-time troubleshooting, and allow users to focus on strategic tasks.
August
August is a free-to-use health AI available on WhatsApp. It provides direct answers to health questions, helps with mental health issues, creates personalized nutrition and fitness plans, and offers proactive support. August is designed to be a comprehensive health companion, available 24/7.
BrowseGPT
BrowseGPT is a free Chrome extension that uses artificial intelligence to automate your browser. You can give BrowseGPT instructions like "Find a place to stay in Seattle on February 22nd" or "buy a children's book on Amazon", and it will use OpenAI's GPT-3 model to process web pages and issue commands like CLICK, ENTER_TEXT, or NAVIGATE to complete the task for you.
Keylight AI
Keylight AI is an AI-powered solution designed to help users efficiently find information within their documents. It offers lightning-fast searches, precision accuracy, a user-friendly interface, customizable prompts, and ensures secure and confidential document handling. Ideal for professionals across various industries, Keylight AI revolutionizes document search by providing quick and efficient navigation. Users can boost their productivity and save time with this innovative tool.
Deepfind
Deepfind is a privacy-first AI search engine that prioritizes user data protection. It allows users to conduct searches without the use of cookies, tracking, or storing personal information. Deepfind aims to provide a secure and efficient search experience while maintaining user privacy and data security.
Phind AI
Phind AI is a cost-effective alternative to other AI search engines, making AI search accessible to everyone, regardless of location. It offers a comprehensive search experience with a user-friendly interface and advanced features.
Qatalog
Qatalog is a business search engine that provides real-time access to data across various company systems and applications. It uses natural language processing and machine learning to understand user queries and deliver relevant results from multiple data sources. Qatalog eliminates the need to search through multiple systems and applications, saving employees time and improving productivity.
Tremello
Tremello is a market research platform that uses AI to deliver off-market data. It combines a leading AI engine with human experts to provide bespoke intelligence delivered directly to the user's inbox. Tremello's AI analyzes relationships, identifies patterns, and considers the broader context, delivering meaningful and actionable insights on top of a base human layer. It leverages a diverse range of data sources, including public and private databases, industry reports, social media archives, company websites, and government filings, ensuring a complete and comprehensive picture of the research subject.
Kira Systems
Kira Systems is a machine learning contract search, review, and analysis software that helps businesses identify, extract, and analyze content in their contracts and documents. It uses patented machine learning technology to extract concepts and data points with high efficiency and accuracy. Kira also has built-in intelligence that streamlines the contract review process with out-of-the-box smart fields. Businesses can also create their own smart fields to find specific data points using Kira's no-code machine learning tool. Kira's adaptive workflows allow businesses to organize, track, and export results. Kira has a partner ecosystem that allows businesses to transform how teams work with their contracts.
Shieldbase
Shieldbase is an AI-powered enterprise search tool designed to provide secure and efficient search capabilities for businesses. It utilizes advanced artificial intelligence algorithms to index and retrieve information from various data sources within an organization, ensuring quick and accurate search results. With a focus on security, Shieldbase offers encryption and access control features to protect sensitive data. The platform is user-friendly and customizable, making it easy for businesses to implement and integrate into their existing systems. Shieldbase enhances productivity by enabling employees to quickly find the information they need, ultimately improving decision-making processes and overall operational efficiency.
Vanga AI
Vanga AI is an AI-powered upselling and cross-selling tool for Shopify stores. It helps businesses increase their revenue by automatically generating and displaying upsells and cross-sells on their post-purchase and thank you pages. Vanga AI uses data to find the products that customers are most likely to buy together, and it creates custom upsell funnels for each product. The tool is easy to use and requires no setup or maintenance. Vanga AI offers a 14-day free trial and two paid plans, starting at $9/month.
Jobs-Scout
Jobs-Scout is an AI-powered job search engine that helps you find your dream job. With Jobs-Scout, you can search for jobs by keyword, location, and industry. You can also filter your search results by salary, experience, and education level. Jobs-Scout also provides personalized job recommendations based on your skills and interests.
Picarta AI
Picarta AI is an image geolocalization solution that uses artificial intelligence to find where a photo has been taken in the world. By uploading a photo, users can get the GPS location, latitude, longitude, time stamp, and camera details of the image. Picarta AI also offers a map view of the image location and allows users to download the map. The company's vision is to empower individuals and businesses with the most accurate and reliable image geolocalization solution, unlocking new possibilities for exploration, research, and decision-making.
Find AI
Find AI is an AI-powered search engine that provides users with advanced search capabilities to unlock contact details and gain more accurate insights. The platform caters to individuals and companies looking to research people, companies, startups, founders, and more. Users can access email addresses and premium search features to explore a wide range of data related to various industries and sectors. Find AI offers a user-friendly interface and efficient search algorithms to deliver relevant results in a timely manner.
20 - Open Source AI Tools
cleanlab
Cleanlab helps you **clean** data and **lab** els by automatically detecting issues in a ML dataset. To facilitate **machine learning with messy, real-world data** , this data-centric AI package uses your _existing_ models to estimate dataset problems that can be fixed to train even _better_ models.
data-juicer
Data-Juicer is a one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs. It is a systematic & reusable library of 80+ core OPs, 20+ reusable config recipes, and 20+ feature-rich dedicated toolkits, designed to function independently of specific LLM datasets and processing pipelines. Data-Juicer allows detailed data analyses with an automated report generation feature for a deeper understanding of your dataset. Coupled with multi-dimension automatic evaluation capabilities, it supports a timely feedback loop at multiple stages in the LLM development process. Data-Juicer offers tens of pre-built data processing recipes for pre-training, fine-tuning, en, zh, and more scenarios. It provides a speedy data processing pipeline requiring less memory and CPU usage, optimized for maximum productivity. Data-Juicer is flexible & extensible, accommodating most types of data formats and allowing flexible combinations of OPs. It is designed for simplicity, with comprehensive documentation, easy start guides and demo configs, and intuitive configuration with simple adding/removing OPs from existing configs.
txtai
Txtai is an all-in-one embeddings database for semantic search, LLM orchestration, and language model workflows. It combines vector indexes, graph networks, and relational databases to enable vector search with SQL, topic modeling, retrieval augmented generation, and more. Txtai can stand alone or serve as a knowledge source for large language models (LLMs). Key features include vector search with SQL, object storage, topic modeling, graph analysis, multimodal indexing, embedding creation for various data types, pipelines powered by language models, workflows to connect pipelines, and support for Python, JavaScript, Java, Rust, and Go. Txtai is open-source under the Apache 2.0 license.
SQLAgent
DataAgent is a multi-agent system for data analysis, capable of understanding data development and data analysis requirements, understanding data, and generating SQL and Python code for tasks such as data query, data visualization, and machine learning.
airda
airda(Air Data Agent) is a multi-agent system for data analysis, which can understand data development and data analysis requirements, understand data, and generate SQL and Python code for data query, data visualization, machine learning and other tasks.
qlib
Qlib is an open-source, AI-oriented quantitative investment platform that supports diverse machine learning modeling paradigms, including supervised learning, market dynamics modeling, and reinforcement learning. It covers the entire chain of quantitative investment, from alpha seeking to order execution. The platform empowers researchers to explore ideas and implement productions using AI technologies in quantitative investment. Qlib collaboratively solves key challenges in quantitative investment by releasing state-of-the-art research works in various paradigms. It provides a full ML pipeline for data processing, model training, and back-testing, enabling users to perform tasks such as forecasting market patterns, adapting to market dynamics, and modeling continuous investment decisions.
holmesgpt
HolmesGPT is an open-source DevOps assistant powered by OpenAI or any tool-calling LLM of your choice. It helps in troubleshooting Kubernetes, incident response, ticket management, automated investigation, and runbook automation in plain English. The tool connects to existing observability data, is compliance-friendly, provides transparent results, supports extensible data sources, runbook automation, and integrates with existing workflows. Users can install HolmesGPT using Brew, prebuilt Docker container, Python Poetry, or Docker. The tool requires an API key for functioning and supports OpenAI, Azure AI, and self-hosted LLMs.
llm-course
The LLM course is divided into three parts: 1. 🧩 **LLM Fundamentals** covers essential knowledge about mathematics, Python, and neural networks. 2. 🧑🔬 **The LLM Scientist** focuses on building the best possible LLMs using the latest techniques. 3. 👷 **The LLM Engineer** focuses on creating LLM-based applications and deploying them. For an interactive version of this course, I created two **LLM assistants** that will answer questions and test your knowledge in a personalized way: * 🤗 **HuggingChat Assistant**: Free version using Mixtral-8x7B. * 🤖 **ChatGPT Assistant**: Requires a premium account. ## 📝 Notebooks A list of notebooks and articles related to large language models. ### Tools | Notebook | Description | Notebook | |----------|-------------|----------| | 🧐 LLM AutoEval | Automatically evaluate your LLMs using RunPod | ![Open In Colab](img/colab.svg) | | 🥱 LazyMergekit | Easily merge models using MergeKit in one click. | ![Open In Colab](img/colab.svg) | | 🦎 LazyAxolotl | Fine-tune models in the cloud using Axolotl in one click. | ![Open In Colab](img/colab.svg) | | ⚡ AutoQuant | Quantize LLMs in GGUF, GPTQ, EXL2, AWQ, and HQQ formats in one click. | ![Open In Colab](img/colab.svg) | | 🌳 Model Family Tree | Visualize the family tree of merged models. | ![Open In Colab](img/colab.svg) | | 🚀 ZeroSpace | Automatically create a Gradio chat interface using a free ZeroGPU. | ![Open In Colab](img/colab.svg) |
MiniCPM-V
MiniCPM-V is a series of end-side multimodal LLMs designed for vision-language understanding. The models take image and text inputs to provide high-quality text outputs. The series includes models like MiniCPM-Llama3-V 2.5 with 8B parameters surpassing proprietary models, and MiniCPM-V 2.0, a lighter model with 2B parameters. The models support over 30 languages, efficient deployment on end-side devices, and have strong OCR capabilities. They achieve state-of-the-art performance on various benchmarks and prevent hallucinations in text generation. The models can process high-resolution images efficiently and support multilingual capabilities.
VITA
VITA is an open-source interactive omni multimodal Large Language Model (LLM) capable of processing video, image, text, and audio inputs simultaneously. It stands out with features like Omni Multimodal Understanding, Non-awakening Interaction, and Audio Interrupt Interaction. VITA can respond to user queries without a wake-up word, track and filter external queries in real-time, and handle various query inputs effectively. The model utilizes state tokens and a duplex scheme to enhance the multimodal interactive experience.
LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing
LLM-PowerHouse is a comprehensive and curated guide designed to empower developers, researchers, and enthusiasts to harness the true capabilities of Large Language Models (LLMs) and build intelligent applications that push the boundaries of natural language understanding. This GitHub repository provides in-depth articles, codebase mastery, LLM PlayLab, and resources for cost analysis and network visualization. It covers various aspects of LLMs, including NLP, models, training, evaluation metrics, open LLMs, and more. The repository also includes a collection of code examples and tutorials to help users build and deploy LLM-based applications.
invariant
Invariant Analyzer is an open-source scanner designed for LLM-based AI agents to find bugs, vulnerabilities, and security threats. It scans agent execution traces to identify issues like looping behavior, data leaks, prompt injections, and unsafe code execution. The tool offers a library of built-in checkers, an expressive policy language, data flow analysis, real-time monitoring, and extensible architecture for custom checkers. It helps developers debug AI agents, scan for security violations, and prevent security issues and data breaches during runtime. The analyzer leverages deep contextual understanding and a purpose-built rule matching engine for security policy enforcement.
x-crawl
x-crawl is a flexible Node.js AI-assisted crawler library that offers powerful AI assistance functions to make crawler work more efficient, intelligent, and convenient. It consists of a crawler API and various functions that can work normally even without relying on AI. The AI component is currently based on a large AI model provided by OpenAI, simplifying many tedious operations. The library supports crawling dynamic pages, static pages, interface data, and file data, with features like control page operations, device fingerprinting, asynchronous sync, interval crawling, failed retry handling, rotation proxy, priority queue, crawl information control, and TypeScript support.
langfuse
Langfuse is a powerful tool that helps you develop, monitor, and test your LLM applications. With Langfuse, you can: * **Develop:** Instrument your app and start ingesting traces to Langfuse, inspect and debug complex logs, and manage, version, and deploy prompts from within Langfuse. * **Monitor:** Track metrics (cost, latency, quality) and gain insights from dashboards & data exports, collect and calculate scores for your LLM completions, run model-based evaluations, collect user feedback, and manually score observations in Langfuse. * **Test:** Track and test app behaviour before deploying a new version, test expected in and output pairs and benchmark performance before deploying, and track versions and releases in your application. Langfuse is easy to get started with and offers a generous free tier. You can sign up for Langfuse Cloud or deploy Langfuse locally or on your own infrastructure. Langfuse also offers a variety of integrations to make it easy to connect to your LLM applications.
LARS
LARS is an application that enables users to run Large Language Models (LLMs) locally on their devices, upload their own documents, and engage in conversations where the LLM grounds its responses with the uploaded content. The application focuses on Retrieval Augmented Generation (RAG) to increase accuracy and reduce AI-generated inaccuracies. LARS provides advanced citations, supports various file formats, allows follow-up questions, provides full chat history, and offers customization options for LLM settings. Users can force enable or disable RAG, change system prompts, and tweak advanced LLM settings. The application also supports GPU-accelerated inferencing, multiple embedding models, and text extraction methods. LARS is open-source and aims to be the ultimate RAG-centric LLM application.
lobe-chat-plugins
Lobe Chat Plugins Index is a repository that serves as a collection of various plugins for Function Calling. Users can submit their plugins by following specific instructions. The repository includes a wide range of plugins for different tasks such as image generation, stock analysis, web search, NFT tracking, calendar management, and more. Each plugin is tagged with relevant keywords for easy identification and usage. The repository encourages contributions and provides guidelines for submitting new plugins. It is a valuable resource for developers looking to enhance chatbot functionalities with different plugins.
aiid
The Artificial Intelligence Incident Database (AIID) is a collection of incidents involving the development and use of artificial intelligence (AI). The database is designed to help researchers, policymakers, and the public understand the potential risks and benefits of AI, and to inform the development of policies and practices to mitigate the risks and promote the benefits of AI. The AIID is a collaborative project involving researchers from the University of California, Berkeley, the University of Washington, and the University of Toronto.
lawyer-llama
Lawyer LLaMA is a large language model that has been specifically trained on legal data, including Chinese laws, regulations, and case documents. It has been fine-tuned on a large dataset of legal questions and answers, enabling it to understand and respond to legal inquiries in a comprehensive and informative manner. Lawyer LLaMA is designed to assist legal professionals and individuals with a variety of law-related tasks, including: * **Legal research:** Quickly and efficiently search through vast amounts of legal information to find relevant laws, regulations, and case precedents. * **Legal analysis:** Analyze legal issues, identify potential legal risks, and provide insights on how to proceed. * **Document drafting:** Draft legal documents, such as contracts, pleadings, and legal opinions, with accuracy and precision. * **Legal advice:** Provide general legal advice and guidance on a wide range of legal matters, helping users understand their rights and options. Lawyer LLaMA is a powerful tool that can significantly enhance the efficiency and effectiveness of legal research, analysis, and decision-making. It is an invaluable resource for lawyers, paralegals, law students, and anyone else who needs to navigate the complexities of the legal system.
json_repair
This simple package can be used to fix an invalid json string. To know all cases in which this package will work, check out the unit test. Inspired by https://github.com/josdejong/jsonrepair Motivation Some LLMs are a bit iffy when it comes to returning well formed JSON data, sometimes they skip a parentheses and sometimes they add some words in it, because that's what an LLM does. Luckily, the mistakes LLMs make are simple enough to be fixed without destroying the content. I searched for a lightweight python package that was able to reliably fix this problem but couldn't find any. So I wrote one How to use from json_repair import repair_json good_json_string = repair_json(bad_json_string) # If the string was super broken this will return an empty string You can use this library to completely replace `json.loads()`: import json_repair decoded_object = json_repair.loads(json_string) or just import json_repair decoded_object = json_repair.repair_json(json_string, return_objects=True) Read json from a file or file descriptor JSON repair provides also a drop-in replacement for `json.load()`: import json_repair try: file_descriptor = open(fname, 'rb') except OSError: ... with file_descriptor: decoded_object = json_repair.load(file_descriptor) and another method to read from a file: import json_repair try: decoded_object = json_repair.from_file(json_file) except OSError: ... except IOError: ... Keep in mind that the library will not catch any IO-related exception and those will need to be managed by you Performance considerations If you find this library too slow because is using `json.loads()` you can skip that by passing `skip_json_loads=True` to `repair_json`. Like: from json_repair import repair_json good_json_string = repair_json(bad_json_string, skip_json_loads=True) I made a choice of not using any fast json library to avoid having any external dependency, so that anybody can use it regardless of their stack. Some rules of thumb to use: - Setting `return_objects=True` will always be faster because the parser returns an object already and it doesn't have serialize that object to JSON - `skip_json_loads` is faster only if you 100% know that the string is not a valid JSON - If you are having issues with escaping pass the string as **raw** string like: `r"string with escaping\"" Adding to requirements Please pin this library only on the major version! We use TDD and strict semantic versioning, there will be frequent updates and no breaking changes in minor and patch versions. To ensure that you only pin the major version of this library in your `requirements.txt`, specify the package name followed by the major version and a wildcard for minor and patch versions. For example: json_repair==0.* In this example, any version that starts with `0.` will be acceptable, allowing for updates on minor and patch versions. How it works This module will parse the JSON file following the BNF definition:
20 - OpenAI Gpts
GovChat - Government API Guide
Friendly, technical API expert offering clear guidance on government APIs.
OpenData Explorer
I'll help you access and understand open data published by central government, local authorities and public bodies. You can ask me in your native language.
Chronic Disease Indicators Expert
This chatbot answers questions about the CDC’s Chronic Disease Indicators dataset
Sommelier de dados
Opa! Cole o texto da sua reportagem ou trecho para que eu possa analisá-la com base em manuais de uso de dados em textos jornalísticos.
PPT Expert
PPT Assistant for creating detailed outlines in Markdown, using Chinese by default.
AI OSINT
Your AI OSINT assistant. Our tool helps you find the data needle in the internet haystack.
Open Data Italia bot
Fornisce informazioni sulla normativa italiana in materia di open data, con un tono professionale e divulgativo. In modo che sia più facile chiederne e/o pretenderne la pubblicazione.
BCorpGPT
Query BCorp company data. All data is publicly available. United Kingdom only (for now).
Ordinals API
Knows the docs and can query official ordinal endpoints—Sat Numbers, Inscription IDs, and more.
Graphene Explorer AI
Leading AI in graphene research, offering innovative insights and solutions, powered by OpenAI.