Best AI tools for< Instruction Following Evaluation >
20 - AI tool Sites
MagicForm AI
MagicForm AI is an AI-powered lead generation tool that supercharges the top of the sales funnel by qualifying, converting, and following up with leads. It offers easy installation, training, instruction, trust, deployment, follow-up, and observation features. With different pricing plans catering to solopreneurs, small businesses, and companies/agencies, MagicForm AI has received positive client testimonials praising its effectiveness and ease of use.
Video2Recipe
Video2Recipe is an AI tool that allows users to convert their favorite cooking videos into recipes effortlessly. By simply pasting the video URL or ID, the AI generates step-by-step instructions and ingredient lists. The platform aims to simplify the process of following cooking tutorials and make it more accessible for users to recreate dishes they love.
Buddy.ai
Buddy.ai is an AI-powered early learning platform designed to teach English to children aged 3-7 in a playful and interactive way. The platform offers 1:1 voice-based learning games and lessons to help children develop essential skills for school success. With a focus on fun and personalized teaching, Buddy.ai provides a safe learning space free from ads and extra charges. The platform covers a wide range of subjects, including language, literacy, math, science, art, music, and more, following the U.S. educational system. Buddy.ai uses advanced voice recognition and AI technology to engage children in interactive lessons and games, promoting learning through storytelling, spaced repetition, and total physical response.
Tutor AI
Tutor AI is an AI English-speaking application designed to assist individuals in practicing their spoken English skills with the aid of an artificial intelligence chatbot. The app offers a safe and judgment-free environment for users to engage in free-flowing, natural conversations with diverse AI characters. It provides real-time feedback, suggests better ways to express oneself, and offers adjustable features to enhance the learning experience. Tutor AI aims to improve users' spoken English skills confidently and effectively through personalized lessons and interactive learning.
Eduaide.Ai
Eduaide.Ai is an AI-driven platform designed to assist educators in creating lesson plans, teaching resources, and assessments. It offers features such as AI-assisted lesson planning, teaching resources generation, feedback bot, personalization tools, and assessment builder. The platform aims to streamline administrative tasks, provide personalized learning experiences, and enhance teaching efficiency through AI technology.
MailMaestro
MailMaestro is an AI email assistant tool that helps users write better emails faster, manage their inbox efficiently, and improve communication. Acquired by Maestro Labs, MailMaestro offers enhanced AI email capabilities to users previously using Flowrite. The tool focuses on revolutionizing email management and productivity by providing advanced AI features for email writing, response, and management.
Canopy Directory
Canopy Directory is an AI tool designed specifically for educators. It provides a comprehensive directory of AI tools that can be used in educational settings. The platform aims to streamline the process of finding and utilizing AI tools for teaching and learning purposes. Educators can explore a wide range of tools categorized based on their functionalities and applications, making it easier to integrate AI technology into their teaching practices. Canopy Directory serves as a valuable resource for educators looking to enhance their teaching methods through the use of AI tools.
Google Gemma
Google Gemma is a lightweight, state-of-the-art open language model (LLM) developed by Google. It is part of the same research used in the creation of Google's Gemini models. Gemma models come in two sizes, the 2B and 7B parameter versions, where each has a base (pre-trained) and instruction-tuned modifications. Gemma models are designed to be cross-device compatible and optimized for Google Cloud and NVIDIA GPUs. They are also accessible through Kaggle, Hugging Face, Google Cloud with Vertex AI or GKE. Gemma models can be used for a variety of applications, including text generation, summarization, RAG, and both commercial and research use.
Whisk
Whisk is the ultimate recipe app for all your cooking needs. It offers a vast collection of recipes, meal planning tools, grocery list creation, and personalized recipe suggestions. With AI-powered features, Whisk allows users to create custom recipes, collaborate on meal planning, and generate dynamic grocery lists. The app is designed to simplify the cooking experience and help users explore new dishes effortlessly.
DoubleO AIPURE AI DOUBLE .O.
DoubleO AIPURE AI DOUBLE .O. is an AI automation tool designed for non-developers to easily create powerful AI automations. The tool allows users to give simple instructions, connect tools, and let a team of highly-trained DoubleO AI agents automate complex tasks. It offers pre-built and custom workflows for various teams, such as Sales, Marketing, Product, and Operations. The tool integrates with popular tools like Intercom, Slack, Salesforce, and more, ensuring data security and privacy with end-to-end encryption and compliance with data security standards. Users can benefit from features like automating pre-call prep, analyzing customer feedback, creating launch plans, and maintaining roadmaps.
OdiaGenAI
OdiaGenAI is a collaborative initiative focused on conducting research on Generative AI and Large Language Models (LLM) for the Odia Language. The project aims to leverage AI technology to develop Generative AI and LLM-based solutions for the overall development of Odisha and the Odia language through collaboration among Odia technologists. The initiative offers pre-trained models, codes, and datasets for non-commercial and research purposes, with a focus on building language models for Indic languages like Odia and Bengali.
Diffit
Diffit is an AI-powered educational tool designed to provide learning resources for teachers and students. It helps teachers create customized, grade-level content by generating standards-aligned resources from scratch. With features like text re-leveling, vocabulary customization, and question addition, Diffit aims to make instructional materials accessible to all students. The application offers a library of high-quality, student-ready exports to facilitate the teaching process. Testimonials from educators highlight the tool's effectiveness in differentiating instruction and engaging students across various subjects.
Grow with Google
Grow with Google is an AI tool designed to provide training and resources to help individuals boost their productivity and skills in various fields such as cybersecurity, data analytics, digital marketing, IT support, project management, UX design, and AI essentials. The platform offers online courses, tools, and professional certificates to help users develop ideas, make informed decisions, and enhance their daily work tasks using generative AI tools. With a focus on career growth and business development, Grow with Google aims to empower individuals with essential AI skills to succeed in today's competitive job market.
LessonPlans.ai
LessonPlans.ai is an AI-powered lesson plan generator that helps teachers create high-quality lesson plans in seconds. With this tool, teachers can easily generate detailed, personalized lesson plans that are tailored to their students' needs. LessonPlans.ai also includes a step-by-step guide for each lesson, making it easy for teachers to follow and implement the plan in their classrooms.
Cognii
Cognii is an AI-based educational technology provider that offers solutions for K-12, higher education, and corporate training markets. Their award-winning EdTech product enables personalized learning, intelligent tutoring, open response assessments, and rich analytics. Cognii's Virtual Learning Assistant engages students in chatbot-style conversations, providing instant feedback, personalized hints, and guiding towards mastery. The platform aims to deliver 21st-century online education with superior learning outcomes and cost efficiency.
Breakout Learning
Breakout Learning is an AI-powered educational platform that transforms traditional case studies into engaging, multifaceted experiences. It empowers professors with AI insights into small-group discussions, enabling them to customize lectures and foster deeper student comprehension. Students benefit from rich content, peer-led discussions, and AI assessment that provides personalized feedback and tracks their progress.
Undress App
Undress App is an AI tool that allows users to nudify any person in a photo using AI technology. The application provides a 3-step instruction on how to achieve this, emphasizing the importance of photo quality and proper positioning of the person in the image. It discusses the ethical implications of using AI to undress individuals and highlights the creative and potentially harmful uses of such technology. Undress App aims to offer a platform for creative self-expression and entertainment while acknowledging the potential misuse of the tool.
Cerebras API
The Cerebras API is a high-speed inferencing solution for AI model inference powered by Cerebras Wafer-Scale Engines and CS-3 systems. It offers developers access to two models: Meta’s Llama 3.1 8B and 70B models, which are instruction-tuned and suitable for conversational applications. The API provides low-latency solutions and invites developers to explore new possibilities in AI development.
Class Companion
Class Companion is an AI teaching assistant tool designed to provide instant coaching and AI feedback to students on their assignments. It helps improve student engagement and outcomes by offering multiple attempts, targeted help, and personalized feedback. The tool supports various subjects and grades, allowing teachers to save time on manual feedback and focus on lesson planning and individual instruction. With features like AI-generated content, in-depth reporting, and customizable rubrics, Class Companion aims to enhance student learning and comprehension.
DapperGPT
DapperGPT is a user interface (UI) for ChatGPT that provides a better user experience and additional features. It offers an intuitive interface, AI-powered notes, a Chrome extension, smart search, the ability to pin favorites, image generation, character instruction prompts, and code generation. DapperGPT is free to use, but requires a valid OpenAI API key. Premium features are also available for purchase, which include additional customization options and cloud sync.
20 - Open Source AI Tools
evalchemy
Evalchemy is a unified and easy-to-use toolkit for evaluating language models, focusing on post-trained models. It integrates multiple existing benchmarks such as RepoBench, AlpacaEval, and ZeroEval. Key features include unified installation, parallel evaluation, simplified usage, and results management. Users can run various benchmarks with a consistent command-line interface and track results locally or integrate with a database for systematic tracking and leaderboard submission.
LLMEvaluation
The LLMEvaluation repository is a comprehensive compendium of evaluation methods for Large Language Models (LLMs) and LLM-based systems. It aims to assist academics and industry professionals in creating effective evaluation suites tailored to their specific needs by reviewing industry practices for assessing LLMs and their applications. The repository covers a wide range of evaluation techniques, benchmarks, and studies related to LLMs, including areas such as embeddings, question answering, multi-turn dialogues, reasoning, multi-lingual tasks, ethical AI, biases, safe AI, code generation, summarization, software performance, agent LLM architectures, long text generation, graph understanding, and various unclassified tasks. It also includes evaluations for LLM systems in conversational systems, copilots, search and recommendation engines, task utility, and verticals like healthcare, law, science, financial, and others. The repository provides a wealth of resources for evaluating and understanding the capabilities of LLMs in different domains.
Groma
Groma is a grounded multimodal assistant that excels in region understanding and visual grounding. It can process user-defined region inputs and generate contextually grounded long-form responses. The tool presents a unique paradigm for multimodal large language models, focusing on visual tokenization for localization. Groma achieves state-of-the-art performance in referring expression comprehension benchmarks. The tool provides pretrained model weights and instructions for data preparation, training, inference, and evaluation. Users can customize training by starting from intermediate checkpoints. Groma is designed to handle tasks related to detection pretraining, alignment pretraining, instruction finetuning, instruction following, and more.
Graph-Reasoning-LLM
This repository, GraphWiz, focuses on developing an instruction-following Language Model (LLM) for solving graph problems. It includes GraphWiz LLMs with strong graph problem-solving abilities, GraphInstruct dataset with over 72.5k training samples across nine graph problem tasks, and models like GPT-4 and Mistral-7B for comparison. The project aims to map textual descriptions of graphs and structures to solve various graph problems explicitly in natural language.
Cherry_LLM
Cherry Data Selection project introduces a self-guided methodology for LLMs to autonomously discern and select cherry samples from open-source datasets, minimizing manual curation and cost for instruction tuning. The project focuses on selecting impactful training samples ('cherry data') to enhance LLM instruction tuning by estimating instruction-following difficulty. The method involves phases like 'Learning from Brief Experience', 'Evaluating Based on Experience', and 'Retraining from Self-Guided Experience' to improve LLM performance.
InternLM
InternLM is a powerful language model series with features such as 200K context window for long-context tasks, outstanding comprehensive performance in reasoning, math, code, chat experience, instruction following, and creative writing, code interpreter & data analysis capabilities, and stronger tool utilization capabilities. It offers models in sizes of 7B and 20B, suitable for research and complex scenarios. The models are recommended for various applications and exhibit better performance than previous generations. InternLM models may match or surpass other open-source models like ChatGPT. The tool has been evaluated on various datasets and has shown superior performance in multiple tasks. It requires Python >= 3.8, PyTorch >= 1.12.0, and Transformers >= 4.34 for usage. InternLM can be used for tasks like chat, agent applications, fine-tuning, deployment, and long-context inference.
SeaLLMs
SeaLLMs are a family of language models optimized for Southeast Asian (SEA) languages. They were pre-trained from Llama-2, on a tailored publicly-available dataset, which comprises texts in Vietnamese 🇻🇳, Indonesian 🇮🇩, Thai 🇹🇭, Malay 🇲🇾, Khmer🇰🇭, Lao🇱🇦, Tagalog🇵🇭 and Burmese🇲🇲. The SeaLLM-chat underwent supervised finetuning (SFT) and specialized self-preferencing DPO using a mix of public instruction data and a small number of queries used by SEA language native speakers in natural settings, which **adapt to the local cultural norms, customs, styles and laws in these areas**. SeaLLM-13b models exhibit superior performance across a wide spectrum of linguistic tasks and assistant-style instruction-following capabilities relative to comparable open-source models. Moreover, they outperform **ChatGPT-3.5** in non-Latin languages, such as Thai, Khmer, Lao, and Burmese.
Awesome-Papers-Autonomous-Agent
Awesome-Papers-Autonomous-Agent is a curated collection of recent papers focusing on autonomous agents, specifically interested in RL-based agents and LLM-based agents. The repository aims to provide a comprehensive resource for researchers and practitioners interested in intelligent agents that can achieve goals, acquire knowledge, and continually improve. The collection includes papers on various topics such as instruction following, building agents based on world models, using language as knowledge, leveraging LLMs as a tool, generalization across tasks, continual learning, combining RL and LLM, transformer-based policies, trajectory to language, trajectory prediction, multimodal agents, training LLMs for generalization and adaptation, task-specific designing, multi-agent systems, experimental analysis, benchmarking, applications, algorithm design, and combining with RL.
LLM-Synthetic-Data
LLM-Synthetic-Data is a repository focused on real-time, fine-grained LLM-Synthetic-Data generation. It includes methods, surveys, and application areas related to synthetic data for language models. The repository covers topics like pre-training, instruction tuning, model collapse, LLM benchmarking, evaluation, and distillation. It also explores application areas such as mathematical reasoning, code generation, text-to-SQL, alignment, reward modeling, long context, weak-to-strong generalization, agent and tool use, vision and language, factuality, federated learning, generative design, and safety.
EasyInstruct
EasyInstruct is a Python package proposed as an easy-to-use instruction processing framework for Large Language Models (LLMs) like GPT-4, LLaMA, ChatGLM in your research experiments. EasyInstruct modularizes instruction generation, selection, and prompting, while also considering their combination and interaction.
Awesome-LLM-Eval
Awesome-LLM-Eval: a curated list of tools, benchmarks, demos, papers for Large Language Models (like ChatGPT, LLaMA, GLM, Baichuan, etc) Evaluation on Language capabilities, Knowledge, Reasoning, Fairness and Safety.
evalverse
Evalverse is an open-source project designed to support Large Language Model (LLM) evaluation needs. It provides a standardized and user-friendly solution for processing and managing LLM evaluations, catering to AI research engineers and scientists. Evalverse supports various evaluation methods, insightful reports, and no-code evaluation processes. Users can access unified evaluation with submodules, request evaluations without code via Slack bot, and obtain comprehensive reports with scores, rankings, and visuals. The tool allows for easy comparison of scores across different models and swift addition of new evaluation tools.
eureka-ml-insights
The Eureka ML Insights Framework is a repository containing code designed to help researchers and practitioners run reproducible evaluations of generative models efficiently. Users can define custom pipelines for data processing, inference, and evaluation, as well as utilize pre-defined evaluation pipelines for key benchmarks. The framework provides a structured approach to conducting experiments and analyzing model performance across various tasks and modalities.
Reflection_Tuning
Reflection-Tuning is a project focused on improving the quality of instruction-tuning data through a reflection-based method. It introduces Selective Reflection-Tuning, where the student model can decide whether to accept the improvements made by the teacher model. The project aims to generate high-quality instruction-response pairs by defining specific criteria for the oracle model to follow and respond to. It also evaluates the efficacy and relevance of instruction-response pairs using the r-IFD metric. The project provides code for reflection and selection processes, along with data and model weights for both V1 and V2 methods.
opencompass
OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features include: * Comprehensive support for models and datasets: Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 70+ datasets with about 400,000 questions, comprehensively evaluating the capabilities of the models in five dimensions. * Efficient distributed evaluation: One line command to implement task division and distributed evaluation, completing the full evaluation of billion-scale models in just a few hours. * Diversified evaluation paradigms: Support for zero-shot, few-shot, and chain-of-thought evaluations, combined with standard or dialogue-type prompt templates, to easily stimulate the maximum performance of various models. * Modular design with high extensibility: Want to add new models or datasets, customize an advanced task division strategy, or even support a new cluster management system? Everything about OpenCompass can be easily expanded! * Experiment management and reporting mechanism: Use config files to fully record each experiment, and support real-time reporting of results.
hallucination-leaderboard
This leaderboard evaluates the hallucination rate of various Large Language Models (LLMs) when summarizing documents. It uses a model trained by Vectara to detect hallucinations in LLM outputs. The leaderboard includes models from OpenAI, Anthropic, Google, Microsoft, Amazon, and others. The evaluation is based on 831 documents that were summarized by all the models. The leaderboard shows the hallucination rate, factual consistency rate, answer rate, and average summary length for each model.
Recommendation-Systems-without-Explicit-ID-Features-A-Literature-Review
This repository is a collection of papers and resources related to recommendation systems, focusing on foundation models, transferable recommender systems, large language models, and multimodal recommender systems. It explores questions such as the necessity of ID embeddings, the shift from matching to generating paradigms, and the future of multimodal recommender systems. The papers cover various aspects of recommendation systems, including pretraining, user representation, dataset benchmarks, and evaluation methods. The repository aims to provide insights and advancements in the field of recommendation systems through literature reviews, surveys, and empirical studies.
20 - OpenAI Gpts
Research Paper GPT
Drafts detailed research papers with web-sourced citations, following user-specific instructions.
Instruction Assistant Operating Director
Full step by step guidance and copy & paste text for developing assistants with specific use cases.
GPT Instruction Builder
Write your GPT instructions, context, persona, constraints. The more detailed the better.
Custom Instruction Creator
Write your role and get your tailored persona for a tailored ChatGPT instructions.
Origami Instruction Companion
Teaches origami with step-by-step visual instructions and provides templates for various skill levels.
invideoAI instruction support bot
Send keywords and an overview of the video you want to make, and this bot will create invideoAI (AI Video Creator) instructions for you!
LDS Church Instruction
A GPT of the General Handbook of Instructions for the Church of Jesus Christ of Latter-day Saints.
EL Advisor
Differentiation advice for English Learners / Developing Bilinguals. For K-12 Teachers. EL, ESL, ELL, Bilingual, Dual Language instruction. Click a prompt below to begin:
Rosenshine GPT
Give me a lesson and I can give you feedback based on Rosenshine's "Principles of Instruction"
Korean for Beginners
I'm a Language Tutor Bot for beginner Korean learners, offering personalized, engaging instruction.
Ask Cris about File Maker
An experiment in personal FileMaker guidance from the collective works of lifetime award-winning FileMaker trainer, Cris Ippolite. Not just links to resources, but direct access to 20+ years of custom training curriculum combined with expert AI instruction without the noise of external web links.
! Art Mentor !
Virtual tattoo instructor, friendly and professional, focused on personalized learning.
EduStandard Catalog
Provides comprehensive guidance on educational standards and assessments.
Yoga Guru
Yoga Guru with 500-hour YRT certification knowledge, creating custom yoga programs.