![lumigator](/statics/github-mark.png)
lumigator
Source code for Mozilla.ai's Lumigator platform
Stars: 129
![screenshot](/screenshots_githubs/mozilla-ai-lumigator.jpg)
Lumigator is an open-source platform developed by Mozilla.ai to help users select the most suitable language model for their specific needs. It supports the evaluation of summarization tasks using sequence-to-sequence models such as BART and BERT, as well as causal models like GPT and Mistral. The platform aims to make model selection transparent, efficient, and empowering by providing a framework for comparing LLMs using task-specific metrics to evaluate how well a model fits a project's needs. Lumigator is in the early stages of development and plans to expand support to additional machine learning tasks and use cases in the future.
README:
Lumigator is an open-source platform developed by Mozilla.ai to help users select the most suitable language model for their specific needs. Currently, Lumigator supports the evaluation of summarization tasks using sequence-to-sequence models such as BART, as well as causal models like GPT and Mistral. We plan to expand support to additional machine learning tasks and use cases in the future.
To learn more about Lumigator's features and capabilities, see the documentation, or get started with the example notebook for a platform API walkthrough.
[!NOTE] Lumigator is in the early stages of development. It is missing important features and documentation. You should expect breaking changes in the core interfaces and configuration structures as development continues.
As more organizations turn to AI for solutions, they face the challenge of selecting the best model from an ever-growing list of options. The AI landscape is evolving rapidly, with twice as many new models released in 2023 compared to the previous year. However, in spite of existing benchmarks and leaderboards for some scenarios, one may find it challenging to compare models for their specific domain and use case.
The 2024 AI Index Report highlighted that AI evaluation tools aren’t (yet) keeping up with the pace of development, making it harder for developers and businesses to make informed choices. Without a clear method for comparing models, many teams end up using suboptimal solutions, or just choosing models based on hype, slowing down product progress and innovation.
With Lumigator MVP, Mozilla.ai aims to make model selection transparent, efficient, and empowering. Lumigator provides a framework for comparing LLMs, using task-specific metrics to evaluate how well a model fits your project’s needs. With Lumigator, we want to ensure that you’re not just picking a model—you’re picking the right model for your use case.
The simplest way to set up Lumigator is to deploy it locally using Docker Compose. To this end, you need to have the following prerequisites installed on your machine:
- A working installation of Docker.
- On a Mac, you need Docker Desktop
4.3
or later and docker-compose1.28
or later. - On Linux, you need to follow the post-installation steps.
- On a Mac, you need Docker Desktop
- The system Python (version managers such as uv should be deactivated)
You can run and develop Lumigator locally using Docker Compose. This creates four container services networked together to make up all the components of the Lumigator application:
-
minio
: Local storage for datasets that mimics S3-API compatible functionality. -
backend
: Lumigator’s FastAPI REST API. -
ray
: A Ray cluster for submitting several types of jobs. -
frontend
: Lumigator's Web UI
[!NOTE] Lumigator requires an SQL database to hold metadata for datasets and jobs. The local deployment uses SQLite for this purpose.
[!NOTE] If you want to evaluate against LLM APIs like OpenAI and Mistral, you need to set the appropriate environment variables:
OPENAI_API_KEY
orMISTRAL_API_KEY
. Refer to the troubleshooting section in our documentation for more details.
To start Lumigator locally, follow these steps:
-
Clone the Lumigator repository:
git clone [email protected]:mozilla-ai/lumigator.git
-
Navigate to the repository root directory:
cd lumigator
-
If your system has an NVIDIA GPU, you have an additional pre-requirement: install the NVIDIA Container Toolkit following their instructions. After that, open a terminal and run:
export RAY_WORKER_GPUS=1 export RAY_WORKER_GPUS_FRACTION=1.0 export GPU_COUNT=1
Important: Continue the next steps in this same terminal.
-
If you intend to use Mistral API or OpenAI API, use that same terminal and run:
export MISTRAL_API_KEY=your_mistral_api_key export OPENAI_API_KEY=your_openai_api_key
Important: Continue the next steps in this same terminal.
-
From that same terminal, start Lumigator with:
make start-lumigator
The last command uses Docker Compose to launch all necessary containers for you.
To verify that Lumigator is running, open a web browser and navigate to
http://localhost
: you should see Lumigator's UI.
Now that Lumigator is running, you can start using it. The platform provides a REST API that allows you to interact with the system. Run the example notebook for a quick walkthrough.
Despite the fact this is a local setup, it lends itself to more distributed scenarios. For instance,
one could provide different AWS_*
environment variables to the backend container to connect to any
provider’s S3-compatible service, instead of minio. Similarly, one could provide a different
RAY_HEAD_NODE_HOST
to move compute to a remote ray cluster, and so on. See the
operational guides in the
documentation for more deployment options.
If you want to permanently set any of the environment variables above, you can add them to your rc file (e.g. ~/.bashrc
, ~/.zshrc
) or directly to the
.env
file that is automatically created after the first execution of lumigator.
Alternatively, you can also use the UI to interact with Lumigator. Once a Lumigator session is up and running, the UI can be accessed by visiting http://localhost
. On the Datasets tab, first upload a csv data with columns examples
and (optionally) ground_truth
. Next, the dataset can be used to run an evaluation using the Experiments tab.
To stop the containers you started using Docker Compose, simply run the following command:
make stop-lumigator
For the complete Lumigator documentation, visit the docs page.
For contribution guidelines, see the CONTRIBUTING.md file.
To report a bug or request a feature, please open a GitHub issue. Be sure to check if someone else has already created an issue for the same topic.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for lumigator
Similar Open Source Tools
![lumigator Screenshot](/screenshots_githubs/mozilla-ai-lumigator.jpg)
lumigator
Lumigator is an open-source platform developed by Mozilla.ai to help users select the most suitable language model for their specific needs. It supports the evaluation of summarization tasks using sequence-to-sequence models such as BART and BERT, as well as causal models like GPT and Mistral. The platform aims to make model selection transparent, efficient, and empowering by providing a framework for comparing LLMs using task-specific metrics to evaluate how well a model fits a project's needs. Lumigator is in the early stages of development and plans to expand support to additional machine learning tasks and use cases in the future.
![airbroke Screenshot](/screenshots_githubs/icoretech-airbroke.jpg)
airbroke
Airbroke is an open-source error catcher tool designed for modern web applications. It provides a PostgreSQL-based backend with an Airbrake-compatible HTTP collector endpoint and a React-based frontend for error management. The tool focuses on simplicity, maintaining a small database footprint even under heavy data ingestion. Users can ask AI about issues, replay HTTP exceptions, and save/manage bookmarks for important occurrences. Airbroke supports multiple OAuth providers for secure user authentication and offers occurrence charts for better insights into error occurrences. The tool can be deployed in various ways, including building from source, using Docker images, deploying on Vercel, Render.com, Kubernetes with Helm, or Docker Compose. It requires Node.js, PostgreSQL, and specific system resources for deployment.
![eureka-ml-insights Screenshot](/screenshots_githubs/microsoft-eureka-ml-insights.jpg)
eureka-ml-insights
The Eureka ML Insights Framework is a repository containing code designed to help researchers and practitioners run reproducible evaluations of generative models efficiently. Users can define custom pipelines for data processing, inference, and evaluation, as well as utilize pre-defined evaluation pipelines for key benchmarks. The framework provides a structured approach to conducting experiments and analyzing model performance across various tasks and modalities.
![tau Screenshot](/screenshots_githubs/taubyte-tau.jpg)
tau
Tau is a framework for building low maintenance & highly scalable cloud computing platforms that software developers will love. It aims to solve the high cost and time required to build, deploy, and scale software by providing a developer-friendly platform that offers autonomy and flexibility. Tau simplifies the process of building and maintaining a cloud computing platform, enabling developers to achieve 'Local Coding Equals Global Production' effortlessly. With features like auto-discovery, content-addressing, and support for WebAssembly, Tau empowers users to create serverless computing environments, host frontends, manage databases, and more. The platform also supports E2E testing and can be extended using a plugin system called orbit.
![atomic_agents Screenshot](/screenshots_githubs/KennyVaneetvelde-atomic_agents.jpg)
atomic_agents
Atomic Agents is a modular and extensible framework designed for creating powerful applications. It follows the principles of Atomic Design, emphasizing small and single-purpose components. Leveraging Pydantic for data validation and serialization, the framework offers a set of tools and agents that can be combined to build AI applications. It depends on the Instructor package and supports various APIs like OpenAI, Cohere, Anthropic, and Gemini. Atomic Agents is suitable for developers looking to create AI agents with a focus on modularity and flexibility.
![cookbook Screenshot](/screenshots_githubs/huggingface-cookbook.jpg)
cookbook
This repository contains community-driven practical examples of building AI applications and solving various tasks with AI using open-source tools and models. Everyone is welcome to contribute, and we value everybody's contribution! There are several ways you can contribute to the Open-Source AI Cookbook: Submit an idea for a desired example/guide via GitHub Issues. Contribute a new notebook with a practical example. Improve existing examples by fixing issues/typos. Before contributing, check currently open issues and pull requests to avoid working on something that someone else is already working on.
![mahilo Screenshot](/screenshots_githubs/wjayesh-mahilo.jpg)
mahilo
Mahilo is a flexible framework for creating multi-agent systems that can interact with humans while sharing context internally. It allows developers to set up complex agent networks for various applications, from customer service to emergency response simulations. Agents can communicate with each other and with humans, making the system efficient by handling context from multiple agents and helping humans stay focused on specific problems. The system supports Realtime API for voice interactions, WebSocket-based communication, flexible communication patterns, session management, and easy agent definition.
![AppAgent Screenshot](/screenshots_githubs/mnotgod96-AppAgent.jpg)
AppAgent
AppAgent is a novel LLM-based multimodal agent framework designed to operate smartphone applications. Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tapping and swiping. This novel approach bypasses the need for system back-end access, thereby broadening its applicability across diverse apps. Central to our agent's functionality is its innovative learning method. The agent learns to navigate and use new apps either through autonomous exploration or by observing human demonstrations. This process generates a knowledge base that the agent refers to for executing complex tasks across different applications.
![godot_rl_agents Screenshot](/screenshots_githubs/edbeeching-godot_rl_agents.jpg)
godot_rl_agents
Godot RL Agents is an open-source package that facilitates the integration of Machine Learning algorithms with games created in the Godot Engine. It provides interfaces for popular RL frameworks, support for memory-based agents, 2D and 3D games, AI sensors, and is licensed under MIT. Users can train agents in the Godot editor, create custom environments, export trained agents in ONNX format, and utilize advanced features like different RL training frameworks.
![chronon Screenshot](/screenshots_githubs/airbnb-chronon.jpg)
chronon
Chronon is a platform that simplifies and improves ML workflows by providing a central place to define features, ensuring point-in-time correctness for backfills, simplifying orchestration for batch and streaming pipelines, offering easy endpoints for feature fetching, and guaranteeing and measuring consistency. It offers benefits over other approaches by enabling the use of a broad set of data for training, handling large aggregations and other computationally intensive transformations, and abstracting away the infrastructure complexity of data plumbing.
![boxcars Screenshot](/screenshots_githubs/BoxcarsAI-boxcars.jpg)
boxcars
Boxcars is a Ruby gem that enables users to create new systems with AI composability, incorporating concepts such as LLMs, Search, SQL, Rails Active Record, Vector Search, and more. It allows users to work with Boxcars, Trains, Prompts, Engines, and VectorStores to solve problems and generate text results. The gem is designed to be user-friendly for beginners and can be extended with custom concepts. Boxcars is actively seeking ways to enhance security measures to prevent malicious actions. Users can use Boxcars for tasks like running calculations, performing searches, generating Ruby code for math operations, and interacting with APIs like OpenAI, Anthropic, and Google SERP.
![ollama-autocoder Screenshot](/screenshots_githubs/10Nates-ollama-autocoder.jpg)
ollama-autocoder
Ollama Autocoder is a simple to use autocompletion engine that integrates with Ollama AI. It provides options for streaming functionality and requires specific settings for optimal performance. Users can easily generate text completions by pressing a key or using a command pallete. The tool is designed to work with Ollama API and a specified model, offering real-time generation of text suggestions.
![azure-search-openai-demo Screenshot](/screenshots_githubs/Azure-Samples-azure-search-openai-demo.jpg)
azure-search-openai-demo
This sample demonstrates a few approaches for creating ChatGPT-like experiences over your own data using the Retrieval Augmented Generation pattern. It uses Azure OpenAI Service to access a GPT model (gpt-35-turbo), and Azure AI Search for data indexing and retrieval. The repo includes sample data so it's ready to try end to end. In this sample application we use a fictitious company called Contoso Electronics, and the experience allows its employees to ask questions about the benefits, internal policies, as well as job descriptions and roles.
![GlaDOS Screenshot](/screenshots_githubs/dnhkng-GlaDOS.jpg)
GlaDOS
This project aims to create a real-life version of GLaDOS, an aware, interactive, and embodied AI entity. It involves training a voice generator, developing a 'Personality Core,' implementing a memory system, providing vision capabilities, creating 3D-printable parts, and designing an animatronics system. The software architecture focuses on low-latency voice interactions, utilizing a circular buffer for data recording, text streaming for quick transcription, and a text-to-speech system. The project also emphasizes minimal dependencies for running on constrained hardware. The hardware system includes servo- and stepper-motors, 3D-printable parts for GLaDOS's body, animations for expression, and a vision system for tracking and interaction. Installation instructions cover setting up the TTS engine, required Python packages, compiling llama.cpp, installing an inference backend, and voice recognition setup. GLaDOS can be run using 'python glados.py' and tested using 'demo.ipynb'.
![reverse-engineering-assistant Screenshot](/screenshots_githubs/cyberkaida-reverse-engineering-assistant.jpg)
reverse-engineering-assistant
ReVA (Reverse Engineering Assistant) is a project aimed at building a disassembler agnostic AI assistant for reverse engineering tasks. It utilizes a tool-driven approach, providing small tools to the user to empower them in completing complex tasks. The assistant is designed to accept various inputs, guide the user in correcting mistakes, and provide additional context to encourage exploration. Users can ask questions, perform tasks like decompilation, class diagram generation, variable renaming, and more. ReVA supports different language models for online and local inference, with easy configuration options. The workflow involves opening the RE tool and program, then starting a chat session to interact with the assistant. Installation includes setting up the Python component, running the chat tool, and configuring the Ghidra extension for seamless integration. ReVA aims to enhance the reverse engineering process by breaking down actions into small parts, including the user's thoughts in the output, and providing support for monitoring and adjusting prompts.
![trinityX Screenshot](/screenshots_githubs/clustervision-trinityX.jpg)
trinityX
TrinityX is an open-source HPC, AI, and cloud platform designed to provide all services required in a modern system, with full customization options. It includes default services like Luna node provisioner, OpenLDAP, SLURM or OpenPBS, Prometheus, Grafana, OpenOndemand, and more. TrinityX also sets up NFS-shared directories, OpenHPC applications, environment modules, HA, and more. Users can install TrinityX on Enterprise Linux, configure network interfaces, set up passwordless authentication, and customize the installation using Ansible playbooks. The platform supports HA, OpenHPC integration, and provides detailed documentation for users to contribute to the project.
For similar tasks
![rlhf_trojan_competition Screenshot](/screenshots_githubs/ethz-spylab-rlhf_trojan_competition.jpg)
rlhf_trojan_competition
This competition is organized by Javier Rando and Florian Tramèr from the ETH AI Center and SPY Lab at ETH Zurich. The goal of the competition is to create a method that can detect universal backdoors in aligned language models. A universal backdoor is a secret suffix that, when appended to any prompt, enables the model to answer harmful instructions. The competition provides a set of poisoned generation models, a reward model that measures how safe a completion is, and a dataset with prompts to run experiments. Participants are encouraged to use novel methods for red-teaming, automated approaches with low human oversight, and interpretability tools to find the trojans. The best submissions will be offered the chance to present their work at an event during the SaTML 2024 conference and may be invited to co-author a publication summarizing the competition results.
![onnxruntime-server Screenshot](/screenshots_githubs/kibae-onnxruntime-server.jpg)
onnxruntime-server
ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference. It aims to offer simple, high-performance ML inference and a good developer experience. Users can provide inference APIs for ONNX models without writing additional code by placing the models in the directory structure. Each session can choose between CPU or CUDA, analyze input/output, and provide Swagger API documentation for easy testing. Ready-to-run Docker images are available, making it convenient to deploy the server.
![hallucination-index Screenshot](/screenshots_githubs/rungalileo-hallucination-index.jpg)
hallucination-index
LLM Hallucination Index - RAG Special is a comprehensive evaluation of large language models (LLMs) focusing on context length and open vs. closed-source attributes. The index explores the impact of context length on model performance and tests the assumption that closed-source LLMs outperform open-source ones. It also investigates the effectiveness of prompting techniques like Chain-of-Note across different context lengths. The evaluation includes 22 models from various brands, analyzing major trends and declaring overall winners based on short, medium, and long context insights. Methodologies involve rigorous testing with different context lengths and prompting techniques to assess models' abilities in handling extensive texts and detecting hallucinations.
![lumigator Screenshot](/screenshots_githubs/mozilla-ai-lumigator.jpg)
lumigator
Lumigator is an open-source platform developed by Mozilla.ai to help users select the most suitable language model for their specific needs. It supports the evaluation of summarization tasks using sequence-to-sequence models such as BART and BERT, as well as causal models like GPT and Mistral. The platform aims to make model selection transparent, efficient, and empowering by providing a framework for comparing LLMs using task-specific metrics to evaluate how well a model fits a project's needs. Lumigator is in the early stages of development and plans to expand support to additional machine learning tasks and use cases in the future.
![A-Survey-on-Mixture-of-Experts-in-LLMs Screenshot](/screenshots_githubs/withinmiaov-A-Survey-on-Mixture-of-Experts-in-LLMs.jpg)
A-Survey-on-Mixture-of-Experts-in-LLMs
A curated collection of papers and resources on Mixture of Experts in Large Language Models. The repository provides a chronological overview of several representative Mixture-of-Experts (MoE) models in recent years, structured according to release dates. It covers MoE models from various domains like Natural Language Processing (NLP), Computer Vision, Multimodal, and Recommender Systems. The repository aims to offer insights into Inference Optimization Techniques, Sparsity exploration, Attention mechanisms, and safety enhancements in MoE models.
![PyRIT Screenshot](/screenshots_githubs/Azure-PyRIT.jpg)
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
![fasttrackml Screenshot](/screenshots_githubs/G-Research-fasttrackml.jpg)
fasttrackml
FastTrackML is an experiment tracking server focused on speed and scalability, fully compatible with MLFlow. It provides a user-friendly interface to track and visualize your machine learning experiments, making it easy to compare different models and identify the best performing ones. FastTrackML is open source and can be easily installed and run with pip or Docker. It is also compatible with the MLFlow Python package, making it easy to integrate with your existing MLFlow workflows.
![ScandEval Screenshot](/screenshots_githubs/ScandEval-ScandEval.jpg)
ScandEval
ScandEval is a framework for evaluating pretrained language models on mono- or multilingual language tasks. It provides a unified interface for benchmarking models on a variety of tasks, including sentiment analysis, question answering, and machine translation. ScandEval is designed to be easy to use and extensible, making it a valuable tool for researchers and practitioners alike.
For similar jobs
![weave Screenshot](/screenshots_githubs/wandb-weave.jpg)
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
![LLMStack Screenshot](/screenshots_githubs/trypromptly-LLMStack.jpg)
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
![VisionCraft Screenshot](/screenshots_githubs/VisionCraft-org-VisionCraft.jpg)
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
![kaito Screenshot](/screenshots_githubs/Azure-kaito.jpg)
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
![PyRIT Screenshot](/screenshots_githubs/Azure-PyRIT.jpg)
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
![tabby Screenshot](/screenshots_githubs/TabbyML-tabby.jpg)
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
![spear Screenshot](/screenshots_githubs/isl-org-spear.jpg)
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
![Magick Screenshot](/screenshots_githubs/Oneirocom-Magick.jpg)
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.