Best AI tools for< Assess Performance >
20 - AI tool Sites
H2O.ai
H2O.ai is an AI platform that offers a convergence of the world's best predictive and generative AI solutions. It provides end-to-end GenAI platform for air-gapped, on-premises, or cloud VPC deployments, allowing users to own every part of the stack. With features like h2oGPTe, h2oGPT, H2O Danube3, H2O Eval Studio, and GenAI App Store, H2O.ai empowers users to customize and deploy AI models, assess performance, develop safe applications, and more. The platform is known for democratizing AI with automated machine learning and open-source distributed machine learning.
Simpleem
Simpleem is an Artificial Emotional Intelligence (AEI) tool that helps users uncover intentions, predict success, and leverage behavior for successful interactions. By measuring all interactions and correlating them with concrete outcomes, Simpleem provides insights into verbal, para-verbal, and non-verbal cues to enhance customer relationships, track customer rapport, and assess team performance. The tool aims to identify win/lose patterns in behavior, guide users on boosting performance, and prevent burnout by promptly identifying red flags. Simpleem uses proprietary AI models to analyze real-world data and translate behavioral insights into concrete business metrics, achieving a high accuracy rate of 94% in success prediction.
Karbon
Karbon is an AI-powered practice management software designed for accounting firms to increase visibility, control, automation, efficiency, collaboration, and connectivity. It offers features such as team collaboration, workflow automation, project management, time & budgets tracking, billing & payments, reporting & analysis, artificial intelligence integration, email management, shared inbox, calendar integration, client management, client portal, eSignatures, document management, and enterprise-grade security. Karbon enables firms to automate tasks, work faster, strengthen connections, and drive productivity. It provides services like group onboarding, guided implementation, and enterprise resources including articles, ebooks, and videos for accounting firms. Karbon also offers live training, customer support, and a practice excellence scorecard for firms to assess their performance. The software is known for its AI and GPT integration, helping users save time and improve efficiency.
Graphio
Graphio is an AI-driven employee scoring and scenario builder tool that leverages continuous, real-time scoring with AI agents to assess potential, predict flight risks, and identify future leaders. It replaces subjective evaluations with AI-driven insights to ensure accurate, unbiased decisions in talent management. Graphio uses AI to remove bias in talent management, providing real-time, data-driven insights for fair decisions in promotions, layoffs, and succession planning. It offers compliance features and rules that users can control, ensuring accurate and secure assessments aligned with legal and regulatory requirements. The platform focuses on security, privacy, and personalized coaching to enhance employee engagement and reduce turnover.
Vervoe
Vervoe is an AI-powered recruitment platform and hiring solution that revolutionizes the hiring process by offering skills-based screening through AI job simulations and assessments. It streamlines interviews, provides standardized templates, and facilitates team collaboration. Vervoe enables data-backed decisions by ranking applicants based on performance and offering detailed reports. The platform focuses on task-based evaluations of job-specific skills, enhancing the accuracy of hiring decisions. Employers can create customized tests or choose from a library of scientifically mapped assessments. Vervoe uses AI for recruiting, grading, and ranking candidates efficiently. The platform enhances employer branding, offers candidate feedback, and ensures a seamless candidate experience. Vervoe caters to various industries and company types, making it a versatile tool for modern recruitment processes.
FairPlay
FairPlay is a Fairness-as-a-Service solution designed for financial institutions, offering AI-powered tools to assess automated decisioning models quickly. It helps in increasing fairness and profits by optimizing marketing, underwriting, and pricing strategies. The application provides features such as Fairness Optimizer, Second Look, Customer Composition, Redline Status, and Proxy Detection. FairPlay enables users to identify and overcome tradeoffs between performance and disparity, assess geographic fairness, de-bias proxies for protected classes, and tune models to reduce disparities without increasing risk. It offers advantages like increased compliance, speed, and readiness through automation, higher approval rates with no increase in risk, and rigorous Fair Lending analysis for sponsor banks and regulators. However, some disadvantages include the need for data integration, potential bias in AI algorithms, and the requirement for technical expertise to interpret results.
SmallTalk2Me
SmallTalk2Me is an AI-powered simulator designed to help users improve their spoken English. It offers a range of features, including mock job interviews, IELTS speaking test simulations, and daily stories and courses. The platform uses AI to provide users with instant feedback on their performance, helping them to identify areas for improvement and track their progress over time.
Underwrite.ai
Underwrite.ai is a platform that leverages advances in artificial intelligence and machine learning to provide lenders with nonlinear, dynamic models of credit risk. By analyzing thousands of data points from credit bureau sources, the application accurately models credit risk for consumers and small businesses, outperforming traditional approaches. Underwrite.ai offers a unique underwriting methodology that focuses on outcomes such as profitability and customer lifetime value, allowing organizations to enhance their lending performance without the need for capital investment or lengthy build times. The platform's models are continuously learning and adapting to market changes in real-time, providing explainable decisions in milliseconds.
Cognii
Cognii is an AI-based educational technology provider that offers solutions for K-12, higher education, and corporate training markets. Their award-winning EdTech product enables personalized learning, intelligent tutoring, open response assessments, and rich analytics. Cognii's Virtual Learning Assistant engages students in chatbot-style conversations, providing instant feedback, personalized hints, and guiding towards mastery. The platform aims to deliver 21st-century online education with superior learning outcomes and cost efficiency.
iCAD
iCAD is an AI-powered application designed for cancer detection, specifically focusing on breast cancer. The platform offers a suite of solutions including Detection, Density Assessment, and Risk Evaluation, all backed by science, clinical evidence, and proven patient outcomes. iCAD's AI-powered solutions aim to expose the hiding place of cancer, providing certainty and peace of mind, ultimately improving patient outcomes and saving more lives.
Yogger
Yogger is a video analysis app and AI movement screening tool that enables users to analyze movement anytime, anywhere. The technology allows for motion capture on mobile devices, making it easy to improve performance, prevent injuries, and achieve personal bests effortlessly. With Yogger, users can perform multiple movements, gather information instantly, and receive detailed reports on movement screenings. It is a motivational tool for clients looking to improve their assessment scores and a convenient way for trainers and coaches to assess clients and communicate ways to enhance performance.
Clarity AI
Clarity AI is an AI-powered technology platform that offers a Sustainability Tech Kit for sustainable investing, shopping, reporting, and benchmarking. The platform provides built-in sustainability technology with customizable solutions for various needs related to data, methodologies, and tools. It seamlessly integrates into workflows, offering scalable and flexible end-to-end SaaS tools to address sustainability use cases. Clarity AI leverages powerful AI and machine learning to analyze vast amounts of data points, ensuring reliable and transparent data coverage. The platform is designed to empower users to assess, analyze, and report on sustainability aspects efficiently and confidently.
Waggle AI
Waggle AI is an AI-powered coaching tool designed to help leaders and managers improve their skills in real-time. It provides personalized feedback, skill insights, and meeting analytics to enhance leadership effectiveness. Waggle AI integrates seamlessly with existing meeting tools and calendars, offering features such as calendar management, meeting preparation, AI note-taking, meeting metrics, and skills assessment. The application aims to optimize leadership development by nudging users with best practices, providing data-driven insights, and fostering continuous improvement.
Pitch N Hire
Pitch N Hire is an AI-powered Applicant Tracking & Assessment Software designed to assist recruiters in enhancing their talent decisions. The platform offers a robust data-driven approach with descriptive, predictive, and prescriptive analytics to address talent acquisition challenges. It provides insights into candidate behavior, automated processes, and a vast network of career sites. With advanced AI data models, the software forecasts on-the-job performance, streamlines talent pipelines, and offers personalized branded experiences for candidates.
Gradescope
Gradescope is an online assessment platform that helps educators deliver and grade assessments seamlessly. It supports various assignment types, including variable-length assignments, fixed-template assignments, paper-based assignments, and programming projects. Gradescope enables educators to provide detailed feedback, maintain consistency with flexible rubrics, and send grades to students with a click. It also offers valuable insights through per-question and per-rubric statistics, helping educators understand student performance and adjust their teaching strategies. Additionally, Gradescope incorporates AI-assisted grading features, such as answer grouping, to streamline the grading process.
ISMS Copilot
ISMS Copilot is an AI-powered assistant designed to simplify ISO 27001 preparation for both experts and beginners. It offers various features such as ISMS scope definition, risk assessment and treatment, compliance navigation, incident management, business continuity planning, performance tracking, and more. The tool aims to save time, provide precise guidance, and ensure ISO 27001 compliance. With a focus on security and confidentiality, ISMS Copilot is a valuable resource for small businesses and information security professionals.
K2 AI
K2 AI is an AI consulting company that offers a range of services from ideation to impact, focusing on AI strategy, implementation, operation, and research. They support and invest in emerging start-ups and push knowledge boundaries in AI. The company helps executives assess organizational strengths, prioritize AI use cases, develop sustainable AI strategies, and continuously monitor and improve AI solutions. K2 AI also provides executive briefings, model development, and deployment services to catalyze AI initiatives. The company aims to deliver business value through rapid, user-centric, and data-driven AI development.
GreetAI
GreetAI is an AI-powered platform that revolutionizes the hiring process by conducting AI video interviews to evaluate applicants efficiently. The platform provides insightful reports, customizable interview questions, and highlights key points to help recruiters make informed decisions. GreetAI offers features such as interview simulations, job post generation, AI video screenings, and detailed candidate performance metrics.
Jungle AI
Jungle AI is an AI application that provides solutions to improve machine performance and uptime in industries such as wind, solar, manufacturing, and maritime. Their AI solutions offer real-time insights into assets' performance, increase production, prevent downtime, and simplify operations. With features like Canopy and Toucan, Jungle AI helps users prioritize issues, detect abnormal behaviors, and avoid costly downtime. The application is trusted by global teams and has been battle-tested on challenging datasets. Jungle AI's customers benefit from proactive capabilities, context-sensitive alarms, and industry-specific solutions.
Assets Scout
Assets Scout is a website that provides users with valuable information and tools for managing their assets effectively. The platform offers a range of features to help users track, analyze, and optimize their assets, including real-time monitoring, customizable reports, and predictive analytics. With Assets Scout, users can make informed decisions to maximize the value of their assets and achieve their financial goals.
20 - Open Source AI Tools
rageval
Rageval is an evaluation tool for Retrieval-augmented Generation (RAG) methods. It helps evaluate RAG systems by performing tasks such as query rewriting, document ranking, information compression, evidence verification, answer generation, and result validation. The tool provides metrics for answer correctness and answer groundedness, along with benchmark results for ASQA and ALCE datasets. Users can install and use Rageval to assess the performance of RAG models in question-answering tasks.
llmperf
LLMPerf is a tool designed for evaluating the performance of Language Model APIs. It provides functionalities for conducting load tests to measure inter-token latency and generation throughput, as well as correctness tests to verify the responses. The tool supports various LLM APIs including OpenAI, Anthropic, TogetherAI, Hugging Face, LiteLLM, Vertex AI, and SageMaker. Users can set different parameters for the tests and analyze the results to assess the performance of the LLM APIs. LLMPerf aims to standardize prompts across different APIs and provide consistent evaluation metrics for comparison.
FlagPerf
FlagPerf is an integrated AI hardware evaluation engine jointly built by the Institute of Intelligence and AI hardware manufacturers. It aims to establish an industry-oriented metric system to evaluate the actual capabilities of AI hardware under software stack combinations (model + framework + compiler). FlagPerf features a multidimensional evaluation metric system that goes beyond just measuring 'whether the chip can support specific model training.' It covers various scenarios and tasks, including computer vision, natural language processing, speech, multimodal, with support for multiple training frameworks and inference engines to connect AI hardware with software ecosystems. It also supports various testing environments to comprehensively assess the performance of domestic AI chips in different scenarios.
MathEval
MathEval is a benchmark designed for evaluating the mathematical capabilities of large models. It includes over 20 evaluation datasets covering various mathematical domains with more than 30,000 math problems. The goal is to assess the performance of large models across different difficulty levels and mathematical subfields. MathEval serves as a reliable reference for comparing mathematical abilities among large models and offers guidance on enhancing their mathematical capabilities in the future.
TrustLLM
TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.
zshot
Zshot is a highly customizable framework for performing Zero and Few shot named entity and relationships recognition. It can be used for mentions extraction, wikification, zero and few shot named entity recognition, zero and few shot named relationship recognition, and visualization of zero-shot NER and RE extraction. The framework consists of two main components: the mentions extractor and the linker. There are multiple mentions extractors and linkers available, each serving a specific purpose. Zshot also includes a relations extractor and a knowledge extractor for extracting relations among entities and performing entity classification. The tool requires Python 3.6+ and dependencies like spacy, torch, transformers, evaluate, and datasets for evaluation over datasets like OntoNotes. Optional dependencies include flair and blink for additional functionalities. Zshot provides examples, tutorials, and evaluation methods to assess the performance of the components.
LLMEvaluation
The LLMEvaluation repository is a comprehensive compendium of evaluation methods for Large Language Models (LLMs) and LLM-based systems. It aims to assist academics and industry professionals in creating effective evaluation suites tailored to their specific needs by reviewing industry practices for assessing LLMs and their applications. The repository covers a wide range of evaluation techniques, benchmarks, and studies related to LLMs, including areas such as embeddings, question answering, multi-turn dialogues, reasoning, multi-lingual tasks, ethical AI, biases, safe AI, code generation, summarization, software performance, agent LLM architectures, long text generation, graph understanding, and various unclassified tasks. It also includes evaluations for LLM systems in conversational systems, copilots, search and recommendation engines, task utility, and verticals like healthcare, law, science, financial, and others. The repository provides a wealth of resources for evaluating and understanding the capabilities of LLMs in different domains.
premsql
PremSQL is an open-source library designed to help developers create secure, fully local Text-to-SQL solutions using small language models. It provides essential tools for building and deploying end-to-end Text-to-SQL pipelines with customizable components, ideal for secure, autonomous AI-powered data analysis. The library offers features like Local-First approach, Customizable Datasets, Robust Executors and Evaluators, Advanced Generators, Error Handling and Self-Correction, Fine-Tuning Support, and End-to-End Pipelines. Users can fine-tune models, generate SQL queries from natural language inputs, handle errors, and evaluate model performance against predefined metrics. PremSQL is extendible for customization and private data usage.
AI_Hospital
AI Hospital is a research repository focusing on the interactive evaluation and collaboration of Large Language Models (LLMs) as intern doctors for clinical diagnosis. The repository includes a simulation module tailored for various medical roles, introduces the Multi-View Medical Evaluation (MVME) Benchmark, provides dialog history documents of LLMs, replication instructions, performance evaluation, and guidance for creating intern doctor agents. The collaborative diagnosis with LLMs emphasizes dispute resolution. The study was authored by Zhihao Fan, Jialong Tang, Wei Chen, Siyuan Wang, Zhongyu Wei, Jun Xie, Fei Huang, and Jingren Zhou.
create-million-parameter-llm-from-scratch
The 'create-million-parameter-llm-from-scratch' repository provides a detailed guide on creating a Large Language Model (LLM) with 2.3 million parameters from scratch. The blog replicates the LLaMA approach, incorporating concepts like RMSNorm for pre-normalization, SwiGLU activation function, and Rotary Embeddings. The model is trained on a basic dataset to demonstrate the ease of creating a million-parameter LLM without the need for a high-end GPU.
starwhale
Starwhale is an MLOps/LLMOps platform that brings efficiency and standardization to machine learning operations. It streamlines the model development lifecycle, enabling teams to optimize workflows around key areas like model building, evaluation, release, and fine-tuning. Starwhale abstracts Model, Runtime, and Dataset as first-class citizens, providing tailored capabilities for common workflow scenarios including Models Evaluation, Live Demo, and LLM Fine-tuning. It is an open-source platform designed for clarity and ease of use, empowering developers to build customized MLOps features tailored to their needs.
SuperKnowa
SuperKnowa is a fast framework to build Enterprise RAG (Retriever Augmented Generation) Pipelines at Scale, powered by watsonx. It accelerates Enterprise Generative AI applications to get prod-ready solutions quickly on private data. The framework provides pluggable components for tackling various Generative AI use cases using Large Language Models (LLMs), allowing users to assemble building blocks to address challenges in AI-driven text generation. SuperKnowa is battle-tested from 1M to 200M private knowledge base & scaled to billions of retriever tokens.
AlignBench
AlignBench is the first comprehensive evaluation benchmark for assessing the alignment level of Chinese large models across multiple dimensions. It includes introduction information, data, and code related to AlignBench. The benchmark aims to evaluate the alignment performance of Chinese large language models through a multi-dimensional and rule-calibrated evaluation method, enhancing reliability and interpretability.
LLM-RGB
LLM-RGB is a repository containing a collection of detailed test cases designed to evaluate the reasoning and generation capabilities of Language Learning Models (LLMs) in complex scenarios. The benchmark assesses LLMs' performance in understanding context, complying with instructions, and handling challenges like long context lengths, multi-step reasoning, and specific response formats. Each test case evaluates an LLM's output based on context length difficulty, reasoning depth difficulty, and instruction compliance difficulty, with a final score calculated for each test case. The repository provides a score table, evaluation details, and quick start guide for running evaluations using promptfoo testing tools.
can-ai-code
Can AI Code is a self-evaluating interview tool for AI coding models. It includes interview questions written by humans and tests taken by AI, inference scripts for common API providers and CUDA-enabled quantization runtimes, a Docker-based sandbox environment for validating untrusted Python and NodeJS code, and the ability to evaluate the impact of prompting techniques and sampling parameters on large language model (LLM) coding performance. Users can also assess LLM coding performance degradation due to quantization. The tool provides test suites for evaluating LLM coding performance, a webapp for exploring results, and comparison scripts for evaluations. It supports multiple interviewers for API and CUDA runtimes, with detailed instructions on running the tool in different environments. The repository structure includes folders for interviews, prompts, parameters, evaluation scripts, comparison scripts, and more.
20 - OpenAI Gpts
Leadership Development Advisor
Guides leadership growth to enhance organizational performance.
I4T Assessor - UNESCO Tech Platform Trust Helper
Helps you evaluate whether or not tech platforms match UNESCO's Internet for Trust Guidelines for the Governance of Digital Platforms
Evaluation Criteria Creator
Simply write any topic (anything superheroes, vacuums, Pokémon’, diamonds…) and I’ll provide the evaluation criteria you can use.
IQ Test
IQ Test is designed to simulate an IQ testing environment. It provides a formal and objective experience, delivering questions and processing answers in a straightforward manner.
Safaricom Financial Analyst
Analyzes Safaricom's HY and FY financials, with detailed insights on different years.
Biz Problem Solver
Revolutionize Problem-Solving: AI-Enhanced, Expert-Driven Business Solutions Like First Principles, A3, 8D, McKinsey 7S, 4S, DMAIC, Kaizen, Lean Six Sigma, 40 TRIZ Principles of Innovation
Digital Assets @ FS
Consultant on digital assets in financial services, using a pricing study for insights.
UNICORN Binance Suite Assistant
Elegant assistance and expertise for integrating the Unicorn Binance Suite.
HomeScore
Assess a potential home's quality using your own photos and property inspection reports
Ready for Transformation
Assess your company's real appetite for new technologies or new ways of working methods
TRL Explorer
Assess the TRL of your projects, get ideas for specific TRLs, learn how to advance from one TRL to the next
🎯 CulturePulse Pro Advisor 🌐
Empowers leaders to gauge and enhance company culture. Use advanced analytics to assess, report, and develop a thriving workplace culture. 🚀💼📊
香港地盤安全佬 HK Construction Site Safety Advisor
Upload a site photo to assess the potential hazard and seek advises from experience AI Safety Officer