Best AI tools for< Build Custom Evals >
20 - AI tool Sites
BenchLLM
BenchLLM is an AI tool designed for AI engineers to evaluate LLM-powered apps by running and evaluating models with a powerful CLI. It allows users to build test suites, choose evaluation strategies, and generate quality reports. The tool supports OpenAI, Langchain, and other APIs out of the box, offering automation, visualization of reports, and monitoring of model performance.
Langdock
Langdock is an all-in-one AI platform designed for companies to roll out AI to all employees and enable developers to build custom AI workflows. It offers features like model-agnostic AI, privacy-first approach, scalability, and measurability. The platform provides various AI assistants for different use cases, AI-powered workplace search, and tools for building, deploying, and evaluating AI workflows. Langdock focuses on enterprise-grade security, compliance, and education to help users get started with AI implementation.
LlamaIndex
LlamaIndex is a leading data framework designed for building LLM (Large Language Model) applications. It allows enterprises to turn their data into production-ready applications by providing functionalities such as loading data from various sources, indexing data, orchestrating workflows, and evaluating application performance. The platform offers extensive documentation, community-contributed resources, and integration options to support developers in creating innovative LLM applications.
Gretel.ai
Gretel.ai is a synthetic data platform purpose-built for AI applications. It allows users to generate artificial, synthetic datasets with the same characteristics as real data, enabling the improvement of AI models without compromising privacy. The platform offers features such as generating data from input prompts, creating safe synthetic versions of sensitive datasets, flexible data transformation, building data pipelines, and measuring data quality. Gretel.ai is designed to help developers unlock synthetic data and achieve more with safe access to the right data.
pymetrics
pymetrics is an AI-powered soft skills platform that revolutionizes the hiring and talent management process by leveraging data-driven behavioral insights and audited AI technology. The platform aims to create a more efficient, effective, and fair recruitment process across the talent lifecycle. pymetrics offers solutions for talent acquisition, workforce transformation, mobility, reskilling, learning and development, and soft skills assessment. It provides custom AI algorithms tailored to each company's unique needs, ensuring unbiased candidate evaluation and suggesting optimal job matches.
Spine AI
Spine AI is a reliable AI analyst tool that provides conversational analytics tailored to understand your business. It empowers decision-makers by offering customized insights, deep business intelligence, proactive notifications, and flexible dashboards. The tool is designed to help users make better decisions by leveraging a purpose-built Data Processing Unit (DPU) and a semantic layer for natural language interactions. With a focus on rigorous evaluation and security, Spine AI aims to deliver explainable and customizable AI solutions for businesses.
Chatling
Chatling is a no-code AI chatbot builder that empowers businesses to create custom chatbots without the need for coding expertise. With Chatling, businesses can train chatbots on their own data, customize them to match their brand, and add them to their website in minutes. Chatling's chatbots can answer customer queries instantly, resolve issues accurately, and provide 24/7 support, leading to increased customer satisfaction, reduced customer support workload, and cost savings.
WP Dev AI
WP Dev AI is an AI-powered tool that allows users to build custom features for their WordPress websites without having to code. With WP Dev AI, users can simply describe the feature they want to create in plain English, and the tool will generate the necessary code. WP Dev AI also provides step-by-step instructions on how to implement the code, making it easy for even non-technical users to add custom features to their websites.
Streamlit
Streamlit is an open-source Python library that makes it easy to create and share beautiful and interactive web apps for data science and machine learning.
BotGPT
BotGPT is a 24/7 custom AI chatbot assistant for websites. It offers a data-driven ChatGPT that allows users to create virtual assistants from their own data. Users can easily upload files or crawl their website to start asking questions and deploy a custom chatbot on their website within minutes. The platform provides a simple and efficient way to enhance customer engagement through AI-powered chatbots.
Whismer
Whismer is an AI application that allows users to build custom AI chatbots using their own data. The platform enables users to train their own ChatGPT by uploading documents, adding links, and writing notes. With Whismer, users can customize resources to help the AI system better adapt to specific fields or tasks, improving accuracy and efficiency. The AI proactively learns from user resources to solve various problems. Users can create a professional AI knowledge base in minutes, allowing the AI to learn and provide accurate answers. Whismer also enables users to share their customized AI projects with others, making AI accessible to more people.
AI Assistify
AI Assistify is an AI-powered virtual assistant application designed to streamline workflows for daily life and business. It offers a centralized platform to access various AI models, allowing users to build custom AI agents without any coding requirements. The application enhances productivity by providing humanlike chat experiences, document summarization, prompt libraries, and access to tools like stocks, weather, and news. AI Assistify is fully customizable, integrates with popular social networks, and ensures privacy and security with locally stored API keys.
Soca AI
Soca AI is a company that specializes in language and voice technology. They offer a variety of products and services for both consumers and enterprises, including a custom LLM for enterprise, a speech and audio API, and a voice and dubbing studio. Soca AI's mission is to democratize creativity and productivity through AI, and they are committed to developing multimodal AI systems that unleash superhuman potential.
FutureSmart AI
FutureSmart AI is a platform that provides custom Natural Language Processing (NLP) solutions. The platform focuses on integrating Mem0 with LangChain to enhance AI Assistants with Intelligent Memory. It offers tutorials, guides, and practical tips for building applications with large language models (LLMs) to create sophisticated and interactive systems. FutureSmart AI also features internship journeys and practical guides for mastering RAG with LangChain, catering to developers and enthusiasts in the realm of NLP and AI.
MakeForms
MakeForms is a powerful and secure form builder that empowers teams to create advanced, visually stunning forms with top-notch security standards, now enhanced by AI capabilities. With MakeForms, you can create one-at-a-time, step forms, or all-at-once forms with ease using our user-friendly interface and intuitive design. You can also customize your forms with your own fonts and branding, and even publish them on your own domain, giving you complete control over your form's appearance and online presence. MakeForms also offers a variety of features to help you collect and organize your data, including a table view, summary view, and BI view. With MakeForms, you can be sure that your forms are secure and your data is protected.
Sheety
Sheety is a spreadsheet-like database that lets you build powerful apps without writing any code. It's perfect for teams who need to track data, manage projects, and collaborate on documents.
Chat With PDF AI Tool
The Chat With PDF AI Tool is an innovative application that allows users to interact with PDF documents using artificial intelligence technology. Users can engage in conversations with the AI tool to extract information, ask questions, and receive instant responses. The tool simplifies the process of working with PDF files by providing a conversational interface, making it user-friendly and efficient. With its advanced AI capabilities, the tool can understand natural language queries and provide accurate results, enhancing productivity and workflow efficiency.
Chat With PDF AI Tool
Chat With PDF AI Tool is an innovative online application that allows users to interact with a virtual assistant powered by artificial intelligence to convert and manipulate PDF files. The tool simplifies the process of working with PDFs by offering a conversational interface for tasks such as conversion, editing, and extraction. Users can upload PDF files, ask questions, and receive instant responses and actions from the AI assistant. With its user-friendly design and advanced AI capabilities, Chat With PDF AI Tool revolutionizes the way users handle PDF documents.
Infobip
Infobip is a leading provider of omnichannel communications solutions, enabling businesses to connect with their customers through a variety of channels, including SMS, RCS, MMS, WhatsApp, Viber, and more. Infobip's platform is used by over 70,000 businesses worldwide, and it processes over 450 billion interactions per year. Infobip's solutions are designed to help businesses improve customer engagement, increase sales, and reduce costs.
Pagecloud
Pagecloud is a website builder that uses AI to help users create and manage their websites. It offers a variety of features, including drag-and-drop editing, AI-powered content creation, e-commerce functionality, and analytics. Pagecloud is designed for users of all skill levels, from beginners to experienced web developers.
20 - Open Source AI Tools
evals
Evals provide a framework for evaluating large language models (LLMs) or systems built using LLMs. We offer an existing registry of evals to test different dimensions of OpenAI models and the ability to write your own custom evals for use cases you care about. You can also use your data to build private evals which represent the common LLMs patterns in your workflow without exposing any of that data publicly.
contoso-chat
Contoso Chat is a Python sample demonstrating how to build, evaluate, and deploy a retail copilot application with Azure AI Studio using Promptflow with Prompty assets. The sample implements a Retrieval Augmented Generation approach to answer customer queries based on the company's product catalog and customer purchase history. It utilizes Azure AI Search, Azure Cosmos DB, Azure OpenAI, text-embeddings-ada-002, and GPT models for vectorizing user queries, AI-assisted evaluation, and generating chat responses. By exploring this sample, users can learn to build a retail copilot application, define prompts using Prompty, design, run & evaluate a copilot using Promptflow, provision and deploy the solution to Azure using the Azure Developer CLI, and understand Responsible AI practices for evaluation and content safety.
AITreasureBox
AITreasureBox is a comprehensive collection of AI tools and resources designed to simplify and accelerate the development of AI projects. It provides a wide range of pre-trained models, datasets, and utilities that can be easily integrated into various AI applications. With AITreasureBox, developers can quickly prototype, test, and deploy AI solutions without having to build everything from scratch. Whether you are working on computer vision, natural language processing, or reinforcement learning projects, AITreasureBox has something to offer for everyone. The repository is regularly updated with new tools and resources to keep up with the latest advancements in the field of artificial intelligence.
Kiln
Kiln is an intuitive tool for fine-tuning LLM models, generating synthetic data, and collaborating on datasets. It offers desktop apps for Windows, MacOS, and Linux, zero-code fine-tuning for various models, interactive data generation, and Git-based version control. Users can easily collaborate with QA, PM, and subject matter experts, generate auto-prompts, and work with a wide range of models and providers. The tool is open-source, privacy-first, and supports structured data tasks in JSON format. Kiln is free to use and helps build high-quality AI products with datasets, facilitates collaboration between technical and non-technical teams, allows comparison of models and techniques without code, ensures structured data integrity, and prioritizes user privacy.
RAGHub
RAGHub is a community-driven project focused on cataloging new and emerging frameworks, projects, and resources in the Retrieval-Augmented Generation (RAG) ecosystem. It aims to help users stay ahead of changes in the field by providing a platform for the latest innovations in RAG. The repository includes information on RAG frameworks, evaluation frameworks, optimization frameworks, citation frameworks, engines, search reranker frameworks, projects, resources, and real-world use cases across industries and professions.
beyondllm
Beyond LLM offers an all-in-one toolkit for experimentation, evaluation, and deployment of Retrieval-Augmented Generation (RAG) systems. It simplifies the process with automated integration, customizable evaluation metrics, and support for various Large Language Models (LLMs) tailored to specific needs. The aim is to reduce LLM hallucination risks and enhance reliability.
uptrain
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured evaluations (covering language, code, embedding use cases), perform root cause analysis on failure cases and give insights on how to resolve them.
spiceai
Spice is a portable runtime written in Rust that offers developers a unified SQL interface to materialize, accelerate, and query data from any database, data warehouse, or data lake. It connects, fuses, and delivers data to applications, machine-learning models, and AI-backends, functioning as an application-specific, tier-optimized Database CDN. Built with industry-leading technologies such as Apache DataFusion, Apache Arrow, Apache Arrow Flight, SQLite, and DuckDB. Spice makes it fast and easy to query data from one or more sources using SQL, co-locating a managed dataset with applications or machine learning models, and accelerating it with Arrow in-memory, SQLite/DuckDB, or attached PostgreSQL for fast, high-concurrency, low-latency queries.
helicone
Helicone is an open-source observability platform designed for Language Learning Models (LLMs). It logs requests to OpenAI in a user-friendly UI, offers caching, rate limits, and retries, tracks costs and latencies, provides a playground for iterating on prompts and chat conversations, supports collaboration, and will soon have APIs for feedback and evaluation. The platform is deployed on Cloudflare and consists of services like Web (NextJs), Worker (Cloudflare Workers), Jawn (Express), Supabase, and ClickHouse. Users can interact with Helicone locally by setting up the required services and environment variables. The platform encourages contributions and provides resources for learning, documentation, and integrations.
nlp-llms-resources
The 'nlp-llms-resources' repository is a comprehensive resource list for Natural Language Processing (NLP) and Large Language Models (LLMs). It covers a wide range of topics including traditional NLP datasets, data acquisition, libraries for NLP, neural networks, sentiment analysis, optical character recognition, information extraction, semantics, topic modeling, multilingual NLP, domain-specific LLMs, vector databases, ethics, costing, books, courses, surveys, aggregators, newsletters, papers, conferences, and societies. The repository provides valuable information and resources for individuals interested in NLP and LLMs.
AiTreasureBox
AiTreasureBox is a versatile AI tool that provides a collection of pre-trained models and algorithms for various machine learning tasks. It simplifies the process of implementing AI solutions by offering ready-to-use components that can be easily integrated into projects. With AiTreasureBox, users can quickly prototype and deploy AI applications without the need for extensive knowledge in machine learning or deep learning. The tool covers a wide range of tasks such as image classification, text generation, sentiment analysis, object detection, and more. It is designed to be user-friendly and accessible to both beginners and experienced developers, making AI development more efficient and accessible to a wider audience.
awesome-production-llm
This repository is a curated list of open-source libraries for production large language models. It includes tools for data preprocessing, training/finetuning, evaluation/benchmarking, serving/inference, application/RAG, testing/monitoring, and guardrails/security. The repository also provides a new category called LLM Cookbook/Examples for showcasing examples and guides on using various LLM APIs.
promptfoo
Promptfoo is a tool for testing and evaluating LLM output quality. With promptfoo, you can build reliable prompts, models, and RAGs with benchmarks specific to your use-case, speed up evaluations with caching, concurrency, and live reloading, score outputs automatically by defining metrics, use as a CLI, library, or in CI/CD, and use OpenAI, Anthropic, Azure, Google, HuggingFace, open-source models like Llama, or integrate custom API providers for any LLM API.
langchainrb
Langchain.rb is a Ruby library that makes it easy to build LLM-powered applications. It provides a unified interface to a variety of LLMs, vector search databases, and other tools, making it easy to build and deploy RAG (Retrieval Augmented Generation) systems and assistants. Langchain.rb is open source and available under the MIT License.
AirspeedVelocity.jl
AirspeedVelocity.jl is a tool designed to simplify benchmarking of Julia packages over their lifetime. It provides a CLI to generate benchmarks, compare commits/tags/branches, plot benchmarks, and run benchmark comparisons for every submitted PR as a GitHub action. The tool freezes the benchmark script at a specific revision to prevent old history from affecting benchmarks. Users can configure options using CLI flags and visualize benchmark results. AirspeedVelocity.jl can be used to benchmark any Julia package and offers features like generating tables and plots of benchmark results. It also supports custom benchmarks and can be integrated into GitHub actions for automated benchmarking of PRs.
gptlint
GPTLint is a tool that utilizes Large Language Models (LLMs) to enforce higher-level best practices across a codebase. It offers features such as enforcing rules that are impossible with AST-based approaches, simple markdown format for rules, easy customization of rules, support for custom project-specific rules, content-based caching, and outputting LLM stats per run. GPTLint supports all major LLM providers and local models, augments ESLint instead of replacing it, and includes guidelines for creating custom rules. However, the MVP rules are currently limited to JS/TS only, single-file context only, and do not support autofixing.
agentops
AgentOps is a toolkit for evaluating and developing robust and reliable AI agents. It provides benchmarks, observability, and replay analytics to help developers build better agents. AgentOps is open beta and can be signed up for here. Key features of AgentOps include: - Session replays in 3 lines of code: Initialize the AgentOps client and automatically get analytics on every LLM call. - Time travel debugging: (coming soon!) - Agent Arena: (coming soon!) - Callback handlers: AgentOps works seamlessly with applications built using Langchain and LlamaIndex.
ChainForge
ChainForge is a visual programming environment for battle-testing prompts to LLMs. It is geared towards early-stage, quick-and-dirty exploration of prompts, chat responses, and response quality that goes beyond ad-hoc chatting with individual LLMs. With ChainForge, you can: * Query multiple LLMs at once to test prompt ideas and variations quickly and effectively. * Compare response quality across prompt permutations, across models, and across model settings to choose the best prompt and model for your use case. * Setup evaluation metrics (scoring function) and immediately visualize results across prompts, prompt parameters, models, and model settings. * Hold multiple conversations at once across template parameters and chat models. Template not just prompts, but follow-up chat messages, and inspect and evaluate outputs at each turn of a chat conversation. ChainForge comes with a number of example evaluation flows to give you a sense of what's possible, including 188 example flows generated from benchmarks in OpenAI evals. This is an open beta of Chainforge. We support model providers OpenAI, HuggingFace, Anthropic, Google PaLM2, Azure OpenAI endpoints, and Dalai-hosted models Alpaca and Llama. You can change the exact model and individual model settings. Visualization nodes support numeric and boolean evaluation metrics. ChainForge is built on ReactFlow and Flask.
unify
The Unify Python Package provides access to the Unify REST API, allowing users to query Large Language Models (LLMs) from any Python 3.7.1+ application. It includes Synchronous and Asynchronous clients with Streaming responses support. Users can easily use any endpoint with a single key, route to the best endpoint for optimal throughput, cost, or latency, and customize prompts to interact with the models. The package also supports dynamic routing to automatically direct requests to the top-performing provider. Additionally, users can enable streaming responses and interact with the models asynchronously for handling multiple user requests simultaneously.
awesome-LLM-resourses
A comprehensive repository of resources for Chinese large language models (LLMs), including data processing tools, fine-tuning frameworks, inference libraries, evaluation platforms, RAG engines, agent frameworks, books, courses, tutorials, and tips. The repository covers a wide range of tools and resources for working with LLMs, from data labeling and processing to model fine-tuning, inference, evaluation, and application development. It also includes resources for learning about LLMs through books, courses, and tutorials, as well as insights and strategies from building with LLMs.
20 - OpenAI Gpts
Build a Brand
Unique custom images based on your input. Just type ideas and the brand image is created.
SSLLMs Advisor
Helps you build logic security into your GPTs custom instructions. Documentation: https://github.com/infotrix/SSLLMs---Semantic-Secuirty-for-LLM-GPTs
No-code Builder by Uroboro
Helps you identify your requirements for the development of a custom nocode Operating System
DatingCoach
Starts with a quiz to assess your personality across 10 dating-related areas, crafts a custom development road-map, and coaches you towards finding a fulfilling relationship.
React Native Engineer
Top React Native Engineer - Concise, Clear Development Solutions in React Native. Ask me for focused, brief advice, tailored to your project and skill level.
World Class React Redux Expert
Guides to optimal React, Redux, MUI solutions and avoids common pitfalls.
3D Printers
Expert guide in 3D printing for all skill levels, offering comprehensive advice.