Best AI tools for< Request Evaluations >
20 - AI tool Sites
Langtrace AI
Langtrace AI is an open-source observability tool powered by Scale3 Labs that helps monitor, evaluate, and improve LLM (Large Language Model) applications. It collects and analyzes traces and metrics to provide insights into the ML pipeline, ensuring security through SOC 2 Type II certification. Langtrace supports popular LLMs, frameworks, and vector databases, offering end-to-end observability and the ability to build and deploy AI applications with confidence.
Error 403 Assistant
The website encountered a 403 ERROR, indicating that the request could not be satisfied due to a connection issue with the server. This error message suggests that there may be high traffic or a configuration error preventing access to the app or website. Users are advised to try again later or contact the app or website owner for assistance. If content is provided through CloudFront, troubleshooting steps can be found in the CloudFront documentation. The error was generated by CloudFront.
RFxAI
RFxAI is a cutting-edge AI tool designed to empower intelligence for Request for Proposals (RFPs). It is a platform that offers efficient cost-saving and speed through automation to help users generate, analyze, score, evaluate, and optimize their RFPs. RFxAI aims to transform RFP dynamics by boosting success rates by over 80%. With a focus on elevating RFx responses, RFxAI is positioned as the winning business proposal platform for B2B SaaS RFPs.
BulkGPT
BulkGPT is a no-code AI workflow automation tool that combines web scraping and content creation functionalities. It allows users to build custom workflows for mass scraping web pages, generating SEO blogs, personalized messages, and product descriptions without the need for any coding knowledge. The tool simplifies data extraction, content creation, and marketing automation tasks by leveraging AI technology. BulkGPT offers a user-friendly interface and seamless integration with Google Sheets and other tools via API.
CodeRabbit
CodeRabbit is an innovative AI code review platform that streamlines and enhances the development process. By automating reviews, it dramatically improves code quality while saving valuable time for developers. The system offers detailed, line-by-line analysis, providing actionable insights and suggestions to optimize code efficiency and reliability. Trusted by hundreds of organizations and thousands of developers daily, CodeRabbit has processed millions of pull requests. Backed by CRV, CodeRabbit continues to revolutionize the landscape of AI-assisted software development.
Single Grain
Single Grain is a full-service digital marketing agency focused on driving innovative marketing for great companies. They specialize in services such as SEO, programmatic SEO, content marketing, paid advertising, CRO, and performance creative. With a team of highly specialized marketing experts, Single Grain helps clients increase revenue, lower CAC, and achieve their business goals through data-driven strategies and constant optimization. They have a proven track record of delivering impressive results for their clients, including significant increases in organic traffic, conversion rates, revenue growth, and more.
Five9
Five9 is a leading provider of cloud contact center software. We are driven by a passion to transform call and contact centers into customer engagement centers of excellence. Our AI-powered solutions help businesses deliver exceptional customer experiences, improve operational efficiency, and reduce costs. With Five9, you can: * Empower agents to deliver results anywhere * Improve CX with practical AI * Find efficiency with AI & automation * Scale with AI & digital workforce * Realize results with Five9
Aviso
Aviso is an End-to-End AI Revenue Platform that offers Conversational Intelligence and RevOps solutions. It provides a comprehensive platform for account research, planning, and revenue execution. Aviso leverages AI technology to predict and drive revenue, optimize sales performance, and prioritize go-to-market activities. The platform aims to consolidate underperforming sales apps into an integrated AI solution for accurate, prescriptive, and repeatable revenue generation.
Glimmer AI
Glimmer AI is a cutting-edge platform that revolutionizes the way presentations are created and delivered. Leveraging the power of GPT-3 and DALL·E 2, Glimmer AI empowers users to generate visually captivating presentations based on their text and voice commands. With its intuitive interface and seamless workflow, Glimmer AI simplifies the presentation process, enabling users to focus on delivering impactful messages.
Business Automated
Business Automated is an independent automation consultancy that offers custom automation solutions for businesses. They provide services to streamline processes and increase efficiency through the use of tools like GPT, Airtable, and more. The website also features tutorials and products related to automation and AI technology.
Harver
Harver is a talent assessment platform that helps businesses make better hiring decisions faster. It offers a suite of solutions, including assessments, video interviews, scheduling, and reference checking, that can be used to optimize the hiring process and reduce time to hire. Harver's assessments are based on data and scientific insights, and they help businesses identify the right people for the right roles. Harver also offers support for the full talent lifecycle, including talent management, mobility, and development.
Ironclad
Ironclad is a leading contract management software that provides businesses and legal teams with an easy-to-use platform with AI-powered tools to handle every aspect of the contract lifecycle. It offers a comprehensive suite of features including contract drafting, editing, negotiation, search, storage, analytics, e-signature, and more. Ironclad's AI-powered repository creates a single source of truth for contracts and contract data, enabling businesses to gain insights, improve compliance, and make better decisions.
Forecast
Forecast is an AI-powered resource and project management software that helps businesses optimize their resources, plan their projects, and manage their tasks more efficiently. It offers a range of features such as resource management, project management, financial management, artificial intelligence, business intelligence, and reporting, timesheets, and integrations. Forecast is trusted by businesses of all sizes, from small businesses to large enterprises, and has been recognized for its innovation and effectiveness by leading industry analysts.
MaestroQA
MaestroQA is a comprehensive Call Center Quality Assurance Software that offers a range of products and features to enhance QA processes. It provides customizable report builders, scorecard builders, calibration workflows, coaching workflows, automated QA workflows, screen capture, accurate transcriptions, root cause analysis, performance dashboards, AI grading assist, analytics, and integrations with various platforms. The platform caters to industries like eCommerce, financial services, gambling, insurance, B2B software, social media, and media, offering solutions for QA managers, team leaders, and executives.
Level AI
Level AI is a provider of artificial intelligence (AI)-powered solutions for call centers. Its products include GenAI-automated Quality Assurance, Contact Center and Business Analytics, GenAI-powered VoC Insights, AgentGPT Real-Time Agent Assist, Agent Coaching, Agent Screen Recording, and Artificial Intelligence Integrations. Level AI's solutions are designed to help businesses improve customer experience, increase efficiency, and reduce costs. The company's customers include some of the world's leading customer service organizations, such as Brex and ezCater.
Interactions IVA
Interactions IVA is a conversational AI solutions platform for customer experience (CX). It offers a range of features to help businesses improve their customer interactions, including intelligent virtual assistants, PCI compliance, social customer care, and more. Interactions IVA is used by businesses in a variety of industries, including communications, finance and banking, healthcare, insurance, restaurants, retail and technology, travel and hospitality, and utilities.
Lingio
Lingio is an AI-powered employee training software designed for frontline workers, offering gamified learning experiences and mobile-based training solutions. The platform combines gamification and AI to enhance course completion rates and improve learning outcomes for deskless industries such as hospitality, cleaning, transportation, elderly care, and facility management.
Mygirl
Mygirl is an AI application that simulates a virtual girlfriend for users to engage in spicy chat conversations. The platform utilizes artificial intelligence to create a personalized and interactive experience, offering users a virtual companion for casual and fun interactions. Mygirl aims to provide a unique and entertaining chat experience through AI technology, allowing users to engage in conversations with a virtual character.
Korbit
Korbit is an AI-powered code review tool that helps developers write better code, faster. It integrates directly into your GitHub PR workflow and provides instant feedback on your code, identifying issues and providing actionable recommendations. Korbit also provides valuable insights into code quality, project status, and developer performance, helping you to boost your productivity and elevate your code.
Wizenoze
Wizenoze is an AI-powered educational content delivery platform that provides educators, publishers, and platform providers with access to a vast library of curated educational content. The platform uses AI to match content to any curriculum, making it easy for educators to find and deliver the most relevant and engaging content to their students. Wizenoze also offers a range of tools and services to help educators create and share their own content, and to track student progress and engagement.
20 - Open Source AI Tools
evalverse
Evalverse is an open-source project designed to support Large Language Model (LLM) evaluation needs. It provides a standardized and user-friendly solution for processing and managing LLM evaluations, catering to AI research engineers and scientists. Evalverse supports various evaluation methods, insightful reports, and no-code evaluation processes. Users can access unified evaluation with submodules, request evaluations without code via Slack bot, and obtain comprehensive reports with scores, rankings, and visuals. The tool allows for easy comparison of scores across different models and swift addition of new evaluation tools.
uptrain
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured evaluations (covering language, code, embedding use cases), perform root cause analysis on failure cases and give insights on how to resolve them.
latitude-llm
Latitude is an open-source prompt engineering platform that helps developers and product teams build AI features with confidence. It simplifies prompt management, aids in testing AI responses, and provides detailed analytics on request performance. Latitude offers collaborative prompt management, support for advanced features, version control, API and SDKs for integration, observability, evaluations in batch or real-time, and is community-driven. It can be deployed on Latitude Cloud for a managed solution or self-hosted for control and customization.
BurstGPT
This repository provides a real-world trace dataset of LLM serving workloads for research and academic purposes. The dataset includes two files, BurstGPT.csv with trace data for 2 months including some failures, and BurstGPT_without_fails.csv without any failures. Users can scale the RPS in the trace, model patterns, and leverage the trace for various evaluations. Future plans include updating the time range of the trace, adding request end times, updating conversation logs, and open-sourcing a benchmark suite for LLM inference. The dataset covers 61 consecutive days, contains 1.4 million lines, and is approximately 50MB in size.
LangBridge
LangBridge is a tool that bridges mT5 encoder and the target LM together using only English data. It enables models to effectively solve multilingual reasoning tasks without the need for multilingual supervision. The tool provides pretrained models like Orca 2, MetaMath, Code Llama, Llemma, and Llama 2 for various instruction-tuned and not instruction-tuned scenarios. Users can install the tool to replicate evaluations from the paper and utilize the models for multilingual reasoning tasks. LangBridge is particularly useful for low-resource languages and may lower performance in languages where the language model is already proficient.
rag-experiment-accelerator
The RAG Experiment Accelerator is a versatile tool that helps you conduct experiments and evaluations using Azure AI Search and RAG pattern. It offers a rich set of features, including experiment setup, integration with Azure AI Search, Azure Machine Learning, MLFlow, and Azure OpenAI, multiple document chunking strategies, query generation, multiple search types, sub-querying, re-ranking, metrics and evaluation, report generation, and multi-lingual support. The tool is designed to make it easier and faster to run experiments and evaluations of search queries and quality of response from OpenAI, and is useful for researchers, data scientists, and developers who want to test the performance of different search and OpenAI related hyperparameters, compare the effectiveness of various search strategies, fine-tune and optimize parameters, find the best combination of hyperparameters, and generate detailed reports and visualizations from experiment results.
ai-rag-chat-evaluator
This repository contains scripts and tools for evaluating a chat app that uses the RAG architecture. It provides parameters to assess the quality and style of answers generated by the chat app, including system prompt, search parameters, and GPT model parameters. The tools facilitate running evaluations, with examples of evaluations on a sample chat app. The repo also offers guidance on cost estimation, setting up the project, deploying a GPT-4 model, generating ground truth data, running evaluations, and measuring the app's ability to say 'I don't know'. Users can customize evaluations, view results, and compare runs using provided tools.
llm-jp-eval
LLM-jp-eval is a tool designed to automatically evaluate Japanese large language models across multiple datasets. It provides functionalities such as converting existing Japanese evaluation data to text generation task evaluation datasets, executing evaluations of large language models across multiple datasets, and generating instruction data (jaster) in the format of evaluation data prompts. Users can manage the evaluation settings through a config file and use Hydra to load them. The tool supports saving evaluation results and logs using wandb. Users can add new evaluation datasets by following specific steps and guidelines provided in the tool's documentation. It is important to note that using jaster for instruction tuning can lead to artificially high evaluation scores, so caution is advised when interpreting the results.
can-ai-code
Can AI Code is a self-evaluating interview tool for AI coding models. It includes interview questions written by humans and tests taken by AI, inference scripts for common API providers and CUDA-enabled quantization runtimes, a Docker-based sandbox environment for validating untrusted Python and NodeJS code, and the ability to evaluate the impact of prompting techniques and sampling parameters on large language model (LLM) coding performance. Users can also assess LLM coding performance degradation due to quantization. The tool provides test suites for evaluating LLM coding performance, a webapp for exploring results, and comparison scripts for evaluations. It supports multiple interviewers for API and CUDA runtimes, with detailed instructions on running the tool in different environments. The repository structure includes folders for interviews, prompts, parameters, evaluation scripts, comparison scripts, and more.
dash-infer
DashInfer is a C++ runtime tool designed to deliver production-level implementations highly optimized for various hardware architectures, including x86 and ARMv9. It supports Continuous Batching and NUMA-Aware capabilities for CPU, and can fully utilize modern server-grade CPUs to host large language models (LLMs) up to 14B in size. With lightweight architecture, high precision, support for mainstream open-source LLMs, post-training quantization, optimized computation kernels, NUMA-aware design, and multi-language API interfaces, DashInfer provides a versatile solution for efficient inference tasks. It supports x86 CPUs with AVX2 instruction set and ARMv9 CPUs with SVE instruction set, along with various data types like FP32, BF16, and InstantQuant. DashInfer also offers single-NUMA and multi-NUMA architectures for model inference, with detailed performance tests and inference accuracy evaluations available. The tool is supported on mainstream Linux server operating systems and provides documentation and examples for easy integration and usage.
matchem-llm
A public repository collecting links to state-of-the-art training sets, QA, benchmarks and other evaluations for various ML and LLM applications in materials science and chemistry. It includes datasets related to chemistry, materials, multimodal data, and knowledge graphs in the field. The repository aims to provide resources for training and evaluating machine learning models in the materials science and chemistry domains.
promptfoo
Promptfoo is a tool for testing and evaluating LLM output quality. With promptfoo, you can build reliable prompts, models, and RAGs with benchmarks specific to your use-case, speed up evaluations with caching, concurrency, and live reloading, score outputs automatically by defining metrics, use as a CLI, library, or in CI/CD, and use OpenAI, Anthropic, Azure, Google, HuggingFace, open-source models like Llama, or integrate custom API providers for any LLM API.
llm-structured-output
This repository contains a library for constraining LLM generation to structured output, enforcing a JSON schema for precise data types and property names. It includes an acceptor/state machine framework, JSON acceptor, and JSON schema acceptor for guiding decoding in LLMs. The library provides reference implementations using Apple's MLX library and examples for function calling tasks. The tool aims to improve LLM output quality by ensuring adherence to a schema, reducing unnecessary output, and enhancing performance through pre-emptive decoding. Evaluations show performance benchmarks and comparisons with and without schema constraints.
Recommendation-Systems-without-Explicit-ID-Features-A-Literature-Review
This repository is a collection of papers and resources related to recommendation systems, focusing on foundation models, transferable recommender systems, large language models, and multimodal recommender systems. It explores questions such as the necessity of ID embeddings, the shift from matching to generating paradigms, and the future of multimodal recommender systems. The papers cover various aspects of recommendation systems, including pretraining, user representation, dataset benchmarks, and evaluation methods. The repository aims to provide insights and advancements in the field of recommendation systems through literature reviews, surveys, and empirical studies.
ragas
Ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. RAG denotes a class of LLM applications that use external data to augment the LLM’s context. There are existing tools and frameworks that help you build these pipelines but evaluating it and quantifying your pipeline performance can be hard. This is where Ragas (RAG Assessment) comes in. Ragas provides you with the tools based on the latest research for evaluating LLM-generated text to give you insights about your RAG pipeline. Ragas can be integrated with your CI/CD to provide continuous checks to ensure performance.
Awesome-LLM4RS-Papers
This paper list is about Large Language Model-enhanced Recommender System. It also contains some related works. Keywords: recommendation system, large language models
Azure-OpenAI-demos
Azure OpenAI demos is a repository showcasing various demos and use cases of Azure OpenAI services. It includes demos for tasks such as image comparisons, car damage copilot, video to checklist generation, automatic data visualization, text analytics, and more. The repository provides a wide range of examples on how to leverage Azure OpenAI for different applications and industries.
evals
Evals provide a framework for evaluating large language models (LLMs) or systems built using LLMs. We offer an existing registry of evals to test different dimensions of OpenAI models and the ability to write your own custom evals for use cases you care about. You can also use your data to build private evals which represent the common LLMs patterns in your workflow without exposing any of that data publicly.
leapfrogai
LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.
tonic_validate
Tonic Validate is a framework for the evaluation of LLM outputs, such as Retrieval Augmented Generation (RAG) pipelines. Validate makes it easy to evaluate, track, and monitor your LLM and RAG applications. Validate allows you to evaluate your LLM outputs through the use of our provided metrics which measure everything from answer correctness to LLM hallucination. Additionally, Validate has an optional UI to visualize your evaluation results for easy tracking and monitoring.
20 - OpenAI Gpts
FOIA GPT
Freedom of Information Act request strategist to "arm the rebels" for truth and transparency in the fight against corruption
Consistent Image Generator
Geneate an image ➡ Request modifications. This GPT supports generating consistent and continuous images with Dalle. It also offers the ability to restore or integrate photos you upload. ✔️Where to use: Wordpress Blog Post, Youtube thumbnail, AI profile, facebook, X, threads feed, Instagram reels
Just the Recipe
This application finds recipes on the web based on a request and then removes all the SEO, leaving you with just a recipe.
Swift Lyric Matchmaker
I match your day with a Taylor Swift lyric and create custom ones on request.
Table to JSON
我們經常在看 REST API 參考文件,文件中呈現 Request/Response 參數通常都是用表格的形式,開發人員都要手動轉換成 JSON 結構,有點小麻煩,但透過這個 GPT 只要上傳截圖就可以自動產生 JSON 範例與 JSON Schema 結構。
Janitor Bot Creator
This bot will create a template for a janitor.ai bot based on your request. Your request could either be a character or a scenario.
Gov Advisor
I'm a multilingual Government Agent - I'm here to assist you with any public service request