MAVIS
Math Visual Intelligent System (Strongest calculator in the world)
Stars: 67
MAVIS (Math Visual Intelligent System) is an AI-driven application that allows users to analyze visual data such as images and generate interactive answers based on them. It can perform complex mathematical calculations, solve programming tasks, and create professional graphics. MAVIS supports Python for coding and frameworks like Matplotlib, Plotly, Seaborn, Altair, NumPy, Math, SymPy, and Pandas. It is designed to make projects more efficient and professional.
README:
MAth Visual Intelligent System (Strongest calculator in the world)
MAVIS: In the darkest times lies the power to create something great – a work only you can understand. It is your light, your strength, your vision. (In den dunkelsten Zeiten liegt die Kraft, etwas Großes zu erschaffen – ein Werk, das nur du verstehen kannst. Es ist dein Licht, deine Stärke, deine Vision.).
⚠️ Still in progress
⚠️ Important warning: Beware of fake accounts!
There is evidence that fake accounts may attempt to misrepresent this project. Please do not share personal information with anyone you do not know and only rely on content coming directly from this repository. Immediately report large activities or accounts to GitHub or the project team.
Deutsch
⚠️ Wichtiger Hinweis: Vorsicht vor Fake-Accounts!
Es gibt Hinweise darauf, dass Fake-Accounts versuchen könnten, dieses Projekt falsch darzustellen. Bitte geben Sie keine persönlichen Daten an Unbekannte weiter und verlassen Sie sich nur auf Inhalte, die direkt aus diesem Repository stammen. Melden Sie umfangreiche Aktivitäten oder Accounts umgehend an GitHub oder das Projektteam.
English: MAVIS is an AI-driven application that allows you to analyze visual data such as images (formats: PNG, JPG, JPEG and GIF) and generate interactive answers based on them. With MAVIS you can perform complex mathematical calculations, solve programming tasks and create professional graphics.
To achieve optimal results, please note the following:
Always display formulas in LaTeX to ensure precise and attractive formatting.
Ask MAVIS to always write code in Python, as this is the only language supported by the user interface.
Ask MAVIS to create graphics using Matplotlib, as the user interface only supports HTML, LaTeX and Python (version 3.13 with the frameworks Matplotlib, Plotly, Seaborn, Altair, NumPy, Math, SymPy and Pandas (if needed soon: PyTorch, TensorFlow, Keras, Scikit-Learn and Hugging Face Transformers (maybe JAX)).
Use the powerful features of MAVIS to make your projects more efficient and professional.
Deutsch
Deutsch: MAVIS ist eine KI-gesteuerte Anwendung, die Ihnen ermöglicht, visuelle Daten wie Bilder (Formate: PNG, JPG, JPEG und GIF) zu analysieren und darauf basierend interaktive Antworten zu generieren. Mit MAVIS können Sie komplexe mathematische Berechnungen durchführen, programmiertechnische Aufgaben lösen und professionelle Grafiken erstellen.
Um optimale Ergebnisse zu erzielen, beachten Sie folgende Hinweise:
Lassen Sie Formeln stets in LaTeX darstellen, um eine präzise und ansprechende Formatierung zu gewährleisten.
Fordern Sie MAVIS dazu auf, Code immer in Python zu schreiben, da nur dieser von der Benutzeroberfläche unterstützt wird.
Bitten Sie MAVIS, Grafiken mithilfe von Matplotlib zu erstellen, da die Benutzeroberfläche ausschließlich HTML, LaTeX und Python (Version 3.13 mit den Frameworks Matplotlib, Plotly, Seaborn, Altair,NumPy, Math, SymPy und Pandas (bei Bedarf soon: PyTorch, TensorFlow, Keras, Scikit-Learn und Hugging Face Transformers (vielleicht noch JAX)) unterstützt.
Nutzen Sie die leistungsstarken Funktionen von MAVIS, um Ihre Projekte effizienter und professioneller zu gestalten.
- [09.11.2024] Start ;-)
- [10.11.2024] Available with Llama3.2 + Demo with Xc++ 2
- [13.11.2024] Demo UI
- [21.11.2024] MAVIS with Python (Version 3.11/12/13 with the frameworks Matplotlib, NumPy, Math, SymPy and Pandas) Demo
- [28.11.2024] Available with Plotly but still bugs that need to be resolved
- [30.11.2024] MAVIS can write PyTorch, TensorFlow, Keras, Scikit-Learn and Hugging Face Transformers (maybe JAX) code and run it side-by-side without the need for an IDE. But it is only intended for experimentation.
- [01.12.2024] MAVIS EAP release
- [03.12.2024] Available with Altair
- [24.12.2024] MAVIS 1.3 EAP reales: new Plotly function Demo + Stronger adaptability through Transformer (Huggingface) + Bigger Input Box
01.12.2024
12.12.2024
24.12.2024
more MAVIS versions
soon...
And more...
01.02.2025
01.03.2025
01.04.2025
01.05.2025
01.06.2025
01.07.2025
01.08.2025
soon
01.01.2026
01.03.2026
01.05.2026
01.07.2026
⚠️ Still in progress
Model | Description | Parameters |
---|---|---|
Mavis 1.2 main | With Xc++ 2 11B or Llama3.2 11B +16GB RAM +23GB storage (Works with one CPU) | 22B |
Mavis 1.2 math | With Xc++ 2 11B or Llama3.2 11B + Qwen 2.5 14B +16GB RAM +23GB storage (Works with one CPU) | 27B |
Mavis 1.2 code | With Xc++ 2 11B or Llama3.2 11B + Qwen 2.5 Coder 14B +16GB RAM +23GB storage (Works with one CPU) | 27B |
Mavis 1.2 math pro | With Xc++ 2 90B or Llama3.2 90B + QwQ +64GB RAM +53GB storage (Works with one CPU) | 122B |
Mavis 1.2 code pro | With Xc++ 2 90B or Llama3.2 90B + Qwen 2.5 Coder 32B +64GB RAM +53GB storage (Works with one CPU) | 122B |
Mavis 1.2 mini | With Xc++ 2 11B or Llama3.2 11B + Qwen 2.5 0.5B +16GB RAM +13GB storage (Works with one CPU) | 11.5B |
Mavis 1.2 mini mini | With Xc++ 2 11B or Llama3.2 11B + smollm:135m +16GB RAM +33GB storage (Works with one CPU) | 11.0135B |
Mavis 1.2.2 main | With Xc++ 2 11B or Llama3.2 11B + Phi4 +16GB RAM +23GB storage (Works with one CPU) | 27B |
Mavis 1.3 main | With Xc++ 2 11B or Qwen2 VL 7B + Llama 3.3 +64GB RAM +53GB storage (Works with one CPU) | 77B |
Mavis 1.3 math | With Xc++ 2 11B or Qwen2 VL 7B + Qwen 2.5 14B +16GB RAM +33GB storage (Works with one CPU) | 21B |
Mavis 1.3 code | With Xc++ 2 11B or Qwen2 VL 7B + Qwen 2.5 Coder 14B +16B RAM +33GB storage (Works with one CPU) | 21B |
Mavis 1.3 math pro | With Xc++ 2 90 or Qwen2 VL 72B + QwQ +64GB RAM +53GB storage (Works with one CPU) | 104B |
Mavis 1.3 math pro | With Xc++ 2 90B or Qwen2 VL 72B + Qwen 2.5 Oder 32B +64GB RAM +53GB storage (Works with one CPU) | 104B |
Mavis 1.4 math | With Xc++ 2 90B or QvQ + QwQ +64GB RAM +53GB storage (Works with one CPU) | 104B |
⚠️ Still in progress
Xc++ 2 (01.12.2024 - goal: 01.06.2025) |
QwQ 32B-preview |
Open AI o1-preview |
Open AI o1 mini |
Open AI o1 |
GPT-4o | Claude3.5 Sonnet |
|
---|---|---|---|---|---|---|---|
GPQA | 65.2 (goal 80.0) | 65.2 | 72.3 | 60.0 | 77.3 | 53.6 | 49.0 |
AIME | 50.0 (goal 59.0) | 50.0 | 44.6 | 56.7 | ... | 59.4 | 53.6 |
MATH-500 | 90.6 (goal 96.0) | 90.6 | 85.5 | 90.0 | 94.8 | 76.6 | 82.6 |
LiveCodeBench | 50.0 (goal 61.0) | 50.0 | 53.6 | 58.0 | ... | 33.4 | 30.4 |
HumanEval | 92.7 (goal 95) | ... | 92.4 | 92.4 | ... | 90.2 | 92.0 |
⚠️ Still in progress
Installation
MAVIS (aka Xc++ 2) is currently under development and is not fully available in this repository. Cloning the repository will only give you the README file, some images and already released code, including the user interface (UI) compatible with Qwen 2.5 Code and Llama 3.2 Vision.
Note: Although manual installation is not much more complicated, it is significantly more stable and secure, so we recommend preferring this method.
The automatic installation installs Git, Python and Ollama, creates a folder and sets up a virtual Python environment. It also installs the required Python frameworks and AI models - but this step is also done automatically with the manual installation.
Automatic Installation (experimental)
If you still prefer the automatic installation, please follow the instructions below:
- Download the C++ file mavis_installer.cpp.
- If issues occur, uninstall all changes made by the automatic installation and try the manual installation instead.
Automatic Installation on Windows
- Install a C++ compiler (e.g., Visual Studio or MinGW).
- Compile the code with the following command:
g++ -std=c++17 -o mavis_installer mavis_installer.cpp
- Run the
.exe
file:mavis_installer.exe
- Follow the installation instructions for your operating system.
Start the UI
You can start the UI in two ways:
- run-mavis-all.bat
(experimental)
- With a batch file for MAVIS 1.2:
-
run-mavis-1-2-main.bat
(recommended) -
run-mavis-1-2-code.bat
(recommended) -
run-mavis-1-2-code-pro.bat
(recommended) -
run-mavis-1-2-math.bat
(recommended) -
run-mavis-1-2-math-pro.bat
(recommended) -
run-mavis-1-2-mini.bat
(recommended) -
run-mavis-1-2-mini-mini.bat
(recommended) -
run-mavis-1-2-3-main.bat
(recommended)
-
- for MAVIS 1.3 EAP:
-
run-mavis-1-3-main.bat
(experimental) -
run-mavis-1-3-code.bat
(experimental) -
run-mavis-1-3-code-pro.bat
(experimental) -
run-mavis-1-3-math.bat
(experimental) -
run-mavis-1-3-math-pro.bat
(experimental)
-
- for MAVIS 1.4 EAP:
-
run-mavis-1-4-math.bat
(experimental)
-
Automatic Installation on macOS/Linux
- Make sure a C++ compiler like
g++
is installed (on macOS, this is typically part of the Xcode Command Line Tools). - Compile the code with the following command:
g++ -std=c++17 -o mavis_installer mavis_installer.cpp
- Make the script executable (if needed) and run it:
chmod +x mavis_installer ./mavis_installer
- Follow the installation instructions for your operating system.
Start the UI
You can start the UI in two ways:
- run-mavis-all.sh
(experimental)
- With a shell file for MAVIS 1.2:
-
run-mavis-1-2-main.sh
(recommended) -
run-mavis-1-2-code.sh
(recommended) -
run-mavis-1-2-code-pro.sh
(recommended) -
run-mavis-1-2-math.sh
(recommended) -
run-mavis-1-2-math-pro.sh
(recommended) -
run-mavis-1-2-mini.sh
(recommended) -
run-mavis-1-2-mini-mini.sh
(recommended) -
run-mavis-1-2-3-main.sh
(recommended)
-
- for MAVIS 1.3 EAP:
-
run-mavis-1-3-main.sh
(experimental) -
run-mavis-1-3-code.sh
(experimental) -
run-mavis-1-3-code-pro.sh
(experimental) -
run-mavis-1-3-math.sh
(experimental) -
run-mavis-1-3-math-pro.sh
(experimental)
-
- for MAVIS 1.4 EAP:
-
run-mavis-1-4-math.sh
(experimental)
-
Manual Installation (recommended)
To successfully install MAVIS, you need the following programs:
-
Git
Download Git from the official website: https://git-scm.com/downloads -
Python
Recommended: Python 3.12 (3.11 is also supported - not Python 3.13 yet). Download Python from the https://www.python.org/downloads/ or from the Microsoft Store. -
Ollama
Ollama is a tool required for MAVIS. Install Ollama from the official website: https://ollama.com/download
The necessary Python frameworks and AI models are automatically installed during manual installation.
Windows
-
Create folder
Create a folder namedPycharmProjects
(C:\Users\DeinBenutzername\PycharmProjects
) if it doesn't already exist. The location and method vary depending on your operating system:-
Option 1: Via File Explorer
- Open File Explorer.
- Navigate to
C:\Users\YourUsername\
. - Create a folder there called
PycharmProjects
.
-
Option 2: Using the Command Prompt
Open the command prompt and run the following commands:mkdir C:\Users\%USERNAME%\PycharmProjects cd C:\Users\%USERNAME%\PycharmProjects
-
-
Clone repository
Clone the repository to a local directory:git clone https://github.com/Peharge/MAVIS
-
Change directory
Navigate to the project directory:cd MAVIS
-
Create Python virtual environment
Set up a virtual environment to install dependencies in isolation:python -m venv env
(You can Do not replace
env
with another name!)
Start the UI
You can start the UI in two ways:
- run-mavis-all.bat
(experimental)
- With a batch file for MAVIS 1.2:
-
run-mavis-1-2-main.bat
(recommended) -
run-mavis-1-2-code.bat
(recommended) -
run-mavis-1-2-code-pro.bat
(recommended) -
run-mavis-1-2-math.bat
(recommended) -
run-mavis-1-2-math-pro.bat
(recommended) -
run-mavis-1-2-mini.bat
(recommended) -
run-mavis-1-2-mini-mini.bat
(recommended) -
run-mavis-1-2-3-main.bat
(recommended)
-
- for MAVIS 1.3 EAP:
-
run-mavis-1-3-main.bat
(experimental) -
run-mavis-1-3-code.bat
(experimental) -
run-mavis-1-3-code-pro.bat
(experimental) -
run-mavis-1-3-math.bat
(experimental) -
run-mavis-1-3-math-pro.bat
(experimental)
-
- for MAVIS 1.4 EAP:
-
run-mavis-1-4-math.bat
(experimental)
-
macOS/Linux
-
Create a folder
Create a folder calledPycharmProjects
(~/PycharmProjects
) if it doesn't already exist. The location and method vary depending on your operating system:-
Option 1: Via the File Manager
- Open the File Manager.
- Navigate to your home directory (
~/
). - Create a folder called
PycharmProjects
there.
-
Option 2: Via Terminal
Open Terminal and run the following commands:mkdir -p ~/PycharmProjects cd ~/PycharmProjects
-
-
Clone repository
Clone the repository to a local directory:git clone https://github.com/Peharge/MAVIS
-
Change directory
Navigate to the project directory:cd MAVIS
-
Create Python virtual environment
Set up a virtual environment to install dependencies in isolation:python -m venv env
(You cannot replace
env
with any other name!)
Start the UI
You can start the UI in two ways:
- run-mavis-all.sh
(experimental)
- With a shell file for MAVIS 1.2:
-
run-mavis-1-2-main.sh
(recommended) -
run-mavis-1-2-code.sh
(recommended) -
run-mavis-1-2-code-pro.sh
(recommended) -
run-mavis-1-2-math.sh
(recommended) -
run-mavis-1-2-math-pro.sh
(recommended) -
run-mavis-1-2-mini.sh
(recommended) -
run-mavis-1-2-mini-mini.sh
(recommended) -
run-mavis-1-2-3-main.sh
(recommended)
-
- for MAVIS 1.3 EAP:
-
run-mavis-1-3-main.sh
(experimental) -
run-mavis-1-3-code.sh
(experimental) -
run-mavis-1-3-code-pro.sh
(experimental) -
run-mavis-1-3-math.sh
(experimental) -
run-mavis-1-3-math-pro.sh
(experimental)
-
- for MAVIS 1.4 EAP:
-
run-mavis-1-4-math.sh
(experimental)
-
Installation
MAVIS (alias Xc++ 2) befindet sich derzeit in der Entwicklung und ist in diesem Repository nicht vollständig verfügbar. Durch das Klonen des Repositories erhalten Sie lediglich die README-Datei, einige Bilder und bereits veröffentlichte Codes, einschließlich der Benutzeroberfläche (UI), die mit Qwen 2.5 Code und Llama 3.2 Vision kompatibel ist.
Hinweis: Die manuelle Installation ist zwar nicht wesentlich komplizierter, jedoch erheblich stabiler und sicherer. Deshalb empfehlen wir, diese Methode zu bevorzugen.
Bei der automatischen Installation werden Git, Python und Ollama installiert, ein Ordner erstellt und eine virtuelle Python-Umgebung eingerichtet. Außerdem werden die erforderlichen Python-Frameworks und KI-Modelle installiert – dieser Schritt erfolgt jedoch auch bei der manuellen Installation automatisch.
Automatische Installation (experimentell)
Solltest du dennoch die automatische Installation bevorzugen, folge bitte den untenstehenden Anweisungen:
- Lade die C++-Datei mavis_installer.cpp herunter.
- Falls es zu Problemen kommt, deinstalliere alle Änderungen, die durch die automatische Installation vorgenommen wurden, und versuche es mit der manuellen Installation.
Automatische Installation auf Windows
- Installiere einen C++-Compiler (z. B. Visual Studio oder MinGW).
- Kompiliere den Code mit folgendem Befehl:
g++ -std=c++17 -o mavis_installer mavis_installer.cpp
- Führe die
.exe
-Datei aus:mavis_installer.exe
- Folge den Installationsanweisungen für dein Betriebssystem.
Starten der UI
Sie können die Benutzeroberfläche auf zwei Arten starten:
- run-mavis-all.bat
(experimentell)
- Mit einer Batch-Datei für MAVIS 1.2:
-
run-mavis-1-2-main.bat
(empfohlen) -
run-mavis-1-2-code.bat
(empfohlen) -
run-mavis-1-2-code-pro.bat
(empfohlen) -
run-mavis-1-2-math.bat
(empfohlen) -
run-mavis-1-2-math-pro.bat
(empfohlen) -
run-mavis-1-2-mini.bat
(empfohlen) -
run-mavis-1-2-mini-mini.bat
(empfohlen) -
run-mavis-1-2-3-main.bat
(empfohlen)
-
- für MAVIS 1.3 EAP:
-
run-mavis-1-3-main.bat
(experimentell) -
run-mavis-1-3-code.bat
(experimentell) -
run-mavis-1-3-code-pro.bat
(experimentell) -
run-mavis-1-3-math.bat
(experimentell) -
run-mavis-1-3-math-pro.bat
(experimentell)
-
- für MAVIS 1.4 EAP:
-
run-mavis-1-4-math.bat
(experimentell)
-
Automatische Installation auf macOS/Linux
- Stelle sicher, dass ein C++-Compiler wie
g++
installiert ist (auf macOS ist dies in der Regel Teil der Xcode Command Line Tools). - Kompiliere den Code mit folgendem Befehl:
g++ -std=c++17 -o mavis_installer mavis_installer.cpp
- Mache das Skript ausführbar (falls erforderlich) und führe es aus:
chmod +x mavis_installer ./mavis_installer
- Folge den Installationsanweisungen für dein Betriebssystem.
Starten der UI
Sie können die Benutzeroberfläche auf zwei Arten starten:
- run-mavis-all.sh
(experimentell)
- Mit einer shell-Datei für MAVIS 1.2:
-
run-mavis-1-2-main.sh
(empfohlen) -
run-mavis-1-2-code.sh
(empfohlen) -
run-mavis-1-2-code-pro.sh
(empfohlen) -
run-mavis-1-2-math.sh
(empfohlen) -
run-mavis-1-2-math-pro.sh
(empfohlen) -
run-mavis-1-2-mini.sh
(empfohlen) -
run-mavis-1-2-mini-mini.sh
(empfohlen) -
run-mavis-1-2-3-main.sh
(empfohlen)
-
- für MAVIS 1.3 EAP:
-
run-mavis-1-3-main.sh
(experimentell) -
run-mavis-1-3-code.sh
(experimentell) -
run-mavis-1-3-code-pro.sh
(experimentell) -
run-mavis-1-3-math.sh
(experimentell) -
run-mavis-1-3-math-pro.sh
(experimentell)
-
- für MAVIS 1.4 EAP:
-
run-mavis-1-4-math.sh
(experimentell)
-
Manuelle Installation (empfohlen)
Um MAVIS erfolgreich zu installieren, benötigen Sie die folgenden Programme:
-
Git
Laden Sie Git von der offiziellen Website herunter:
https://git-scm.com/downloads -
Python
Empfohlen: Python 3.12 (auch 3.11 wird unterstützt - noch nicht Python 3.13).
Laden Sie Python von der https://www.python.org/downloads/ oder über den Microsoft Store herunter. -
Ollama
Ollama ist ein Tool, das für MAVIS erforderlich ist.
Installieren Sie Ollama von der offiziellen Website:
https://ollama.com/download
Die erforderlichen Python-Frameworks und KI-Modelle werden bei der manuellen Installation automatisch mitinstalliert.
Windows
-
Ordner erstellen
Erstelle einen Ordner mit dem NamenPycharmProjects
(C:\Users\DeinBenutzername\PycharmProjects
), falls dieser noch nicht existiert. Der Speicherort und die Methode variieren je nach Betriebssystem:-
Option 1: Über den Datei-Explorer
- Öffne den Datei-Explorer.
- Navigiere zu
C:\Users\DeinBenutzername\
. - Erstelle dort einen Ordner namens
PycharmProjects
.
-
Option 2: Über die Eingabeaufforderung (Command Prompt)
Öffne die Eingabeaufforderung und führe die folgenden Befehle aus:mkdir C:\Users\%USERNAME%\PycharmProjects cd C:\Users\%USERNAME%\PycharmProjects
-
-
Repository klonen Klonen Sie das Repository in ein lokales Verzeichnis:
git clone https://github.com/Peharge/MAVIS
-
In das Verzeichnis wechseln Navigieren Sie in das Projektverzeichnis:
cd MAVIS
-
Virtuelle Python-Umgebung erstellen Richten Sie eine virtuelle Umgebung ein, um Abhängigkeiten isoliert zu installieren:
python -m venv env
(Sie können
env
nicht durch einen anderen Namen ersetzen!)
Starten der UI
Sie können die Benutzeroberfläche auf zwei Arten starten:
- run-mavis-all.bat
(experimentell)
- Mit einer Batch-Datei für MAVIS 1.2:
-
run-mavis-1-2-main.bat
(empfohlen) -
run-mavis-1-2-code.bat
(empfohlen) -
run-mavis-1-2-code-pro.bat
(empfohlen) -
run-mavis-1-2-math.bat
(empfohlen) -
run-mavis-1-2-math-pro.bat
(empfohlen) -
run-mavis-1-2-mini.bat
(empfohlen) -
run-mavis-1-2-mini-mini.bat
(empfohlen) -
run-mavis-1-2-3-main.bat
(empfohlen)
-
- für MAVIS 1.3 EAP:
-
run-mavis-1-3-main.bat
(experimentell) -
run-mavis-1-3-code.bat
(experimentell) -
run-mavis-1-3-code-pro.bat
(experimentell) -
run-mavis-1-3-math.bat
(experimentell) -
run-mavis-1-3-math-pro.bat
(experimentell)
-
- für MAVIS 1.4 EAP:
-
run-mavis-1-4-math.bat
(experimentell)
-
macOS/Linux
-
Ordner erstellen
Erstelle einen Ordner mit dem NamenPycharmProjects
(~/PycharmProjects
), falls dieser noch nicht existiert. Der Speicherort und die Methode variieren je nach Betriebssystem:-
Option 1: Über den Datei-Manager
- Öffne den Datei-Manager.
- Navigiere zu deinem Home-Verzeichnis (
~/
). - Erstelle dort einen Ordner namens
PycharmProjects
.
-
Option 2: Über das Terminal
Öffne das Terminal und führe die folgenden Befehle aus:mkdir -p ~/PycharmProjects cd ~/PycharmProjects
-
-
Repository klonen Klonen Sie das Repository in ein lokales Verzeichnis:
git clone https://github.com/Peharge/MAVIS
-
In das Verzeichnis wechseln Navigieren Sie in das Projektverzeichnis:
cd MAVIS
-
Virtuelle Python-Umgebung erstellen Richten Sie eine virtuelle Umgebung ein, um Abhängigkeiten isoliert zu installieren:
python -m venv env
(Sie können
env
nicht durch einen anderen Namen ersetzen!)
Starten der UI
Sie können die Benutzeroberfläche auf zwei Arten starten:
- run-mavis-all.sh
(experimentell)
- Mit einer shell-Datei für MAVIS 1.2:
-
run-mavis-1-2-main.sh
(empfohlen) -
run-mavis-1-2-code.sh
(empfohlen) -
run-mavis-1-2-code-pro.sh
(empfohlen) -
run-mavis-1-2-math.sh
(empfohlen) -
run-mavis-1-2-math-pro.sh
(empfohlen) -
run-mavis-1-2-mini.sh
(empfohlen) -
run-mavis-1-2-mini-mini.sh
(empfohlen) -
run-mavis-1-2-3-main.sh
(empfohlen)
-
- für MAVIS 1.3 EAP:
-
run-mavis-1-3-main.sh
(experimentell) -
run-mavis-1-3-code.sh
(experimentell) -
run-mavis-1-3-code-pro.sh
(experimentell) -
run-mavis-1-3-math.sh
(experimentell) -
run-mavis-1-3-math-pro.sh
(experimentell)
-
- für MAVIS 1.4 EAP:
-
run-mavis-1-4-math.sh
(experimentell)
-
⚠️ Still in progress (01.02.2024)
using
01.02.2025benutzen
01.02.2025
⚠️ Still in progress
got to: Github/Xc++-II
- Make sure Mavis is always told to use fig.update_layout(mapbox_style="open-street-map") in Python code!
- Stellen Sie sicher, dass Mavis stets darauf hingewiesen wird, im Python-Code fig.update_layout(mapbox_style="open-street-map") zu verwenden!
more Demo
Aufgabe: Du bist ein Professioneller Thermodynamik Lehrer. Fasse mir die Übersicht zusammen in einer Tabelle (Formelsammlung) und lege dich ins Zeug!
Grafik: https://www.ulrich-rapp.de/stoff/thermodynamik/Gasgesetz_AB.pdf
Task: In this task, students are to create a phase diagram for a binary mixed system that shows the phase transitions between the liquid and vapor phases. They use thermodynamic models and learn how to visualize complex phase diagrams using Python and the Matplotlib and Seaborn libraries.
Use the Antoine equation to calculate the vapor pressure for each of the two components at a given temperature. The vapor pressure formula is:
$$ P_A = P_A^0 \cdot x_A \quad \text{and} \quad P_B = P_B^0 \cdot x_B $$
Where $P_A^0$ and $P_B^0$ are the vapor pressures of the pure components at a given temperature $T$, and $x_A$ and $x_B$ are the mole fractions of the two components in the liquid phase.
Given data: The vapor pressure parameters for the two components A and B at different temperatures are described by the Antoine equation. The corresponding constants for each component are:
For component A:
-
$A_A = 8.07131$
-
$B_A = 1730.63$
-
$C_A = 233.426$
For component B:
-
$A_B = 8.14019$
-
$B_B = 1810.94$
-
$C_B = 244.485 $
Good luck!
Authors: Jacob Devlin, Ming-Wei Chang, Kenton Lee ...
Link: arXiv:1810.04805v2
Abstract: We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.
BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
Authors: Mark Chen, Jerry Tworek, Heewoo Jun ...
Link: arXiv:2107.03374
Abstract: We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.
Authors: Long Ouyang, Jeff Wu, Xu Jiang ...
Link: arXiv:2203.02155
Abstract: Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.
more Paper
4. Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books (2015)
Authors: Yukun Zhu, Ryan Kiros, Richard Zemel ...
Link: arXiv:1506.06724
Abstract: This work presents a Neural Architecture Search (NAS) method using reinforcement learning to automatically generate neural network architectures. NAS demonstrates the ability to discover novel architectures that outperform human-designed models on standard benchmarks.
Authors: Tom B. Brown, Benjamin Mann, Nick Ryder ...
Link: arXiv:2005.14165v4
Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.
Authors: Chenfei Wu, Shengming Yin, Weizhen Qi ...
Link: arXiv:2303.04671
Abstract: ChatGPT is attracting a cross-field interest as it provides a language interface with remarkable conversational competency and reasoning capabilities across many domains. However, since ChatGPT is trained with languages, it is currently not capable of processing or generating images from the visual world. At the same time, Visual Foundation Models, such as Visual Transformers or Stable Diffusion, although showing great visual understanding and generation capabilities, they are only experts on specific tasks with one-round fixed inputs and outputs. To this end, We build a system called \textbf{Visual ChatGPT}, incorporating different Visual Foundation Models, to enable the user to interact with ChatGPT by 1) sending and receiving not only languages but also images 2) providing complex visual questions or visual editing instructions that require the collaboration of multiple AI models with multi-steps. 3) providing feedback and asking for corrected results. We design a series of prompts to inject the visual model information into ChatGPT, considering models of multiple inputs/outputs and models that require visual feedback. Experiments show that Visual ChatGPT opens the door to investigating the visual roles of ChatGPT with the help of Visual Foundation Models.
Authors: Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman ...
Link: arXiv:2108.07258
Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.
Authors: Long Ouyang, Jeff Wu, Xu Jiang ...
Link: arXiv:2203.02155
Abstract: Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.
Authors: Renqian Luo, Liai Sun, Yingce Xia ...
Link: arXiv:2210.10341
Abstract: Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain. Among the two main branches of pre-trained language models in the general language domain, i.e., BERT (and its variants) and GPT (and its variants), the first one has been extensively studied in the biomedical domain, such as BioBERT and PubMedBERT. While they have achieved great success on a variety of discriminative downstream biomedical tasks, the lack of generation ability constrains their application scope. In this paper, we propose BioGPT, a domain-specific generative Transformer language model pre-trained on large scale biomedical literature. We evaluate BioGPT on six biomedical NLP tasks and demonstrate that our model outperforms previous models on most tasks. Especially, we get 44.98%, 38.42% and 40.76% F1 score on BC5CDR, KD-DTI and DDI end-to-end relation extraction tasks respectively, and 78.2% accuracy on PubMedQA, creating a new record. Our case study on text generation further demonstrates the advantage of BioGPT on biomedical literature to generate fluent descriptions for biomedical terms.
Authors: Irene Solaiman, Miles Brundage, Jack Clark ...
Link: arXiv:1908.09203
Abstract: Large language models have a range of beneficial uses: they can assist in prose, poetry, and programming; analyze dataset biases; and more. However, their flexibility and generative capabilities also raise misuse concerns. This report discusses OpenAI's work related to the release of its GPT-2 language model. It discusses staged release, which allows time between model releases to conduct risk and benefit analyses as model sizes increased. It also discusses ongoing partnership-based research and provides recommendations for better coordination and responsible publication in AI.
**Authors:**Reiichiro Nakano, Jacob Hilton, Suchir Balaji ...
Link: arXiv:2112.09332
Abstract: We fine-tune GPT-3 to answer long-form questions using a text-based web-browsing environment, which allows the model to search and navigate the web. By setting up the task so that it can be performed by humans, we are able to train models on the task using imitation learning, and then optimize answer quality with human feedback. To make human evaluation of factual accuracy easier, models must collect references while browsing in support of their answers. We train and evaluate our models on ELI5, a dataset of questions asked by Reddit users. Our best model is obtained by fine-tuning GPT-3 using behavior cloning, and then performing rejection sampling against a reward model trained to predict human preferences. This model's answers are preferred by humans 56% of the time to those of our human demonstrators, and 69% of the time to the highest-voted answer from Reddit.
Authors: OpenAI, Josh Achiam, Steven Adler ...
Link: arXiv:2303.08774
Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4.
Authors: Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan ...
Link: arXiv:2303.12712
Abstract: Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions.
soon more...
This project is licensed under the MIT license – see the LICENSE file for details.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for MAVIS
Similar Open Source Tools
MAVIS
MAVIS (Math Visual Intelligent System) is an AI-driven application that allows users to analyze visual data such as images and generate interactive answers based on them. It can perform complex mathematical calculations, solve programming tasks, and create professional graphics. MAVIS supports Python for coding and frameworks like Matplotlib, Plotly, Seaborn, Altair, NumPy, Math, SymPy, and Pandas. It is designed to make projects more efficient and professional.
pgvecto.rs
pgvecto.rs is a Postgres extension written in Rust that provides vector similarity search functions. It offers ultra-low-latency, high-precision vector search capabilities, including sparse vector search and full-text search. With complete SQL support, async indexing, and easy data management, it simplifies data handling. The extension supports various data types like FP16/INT8, binary vectors, and Matryoshka embeddings. It ensures system performance with production-ready features, high availability, and resource efficiency. Security and permissions are managed through easy access control. The tool allows users to create tables with vector columns, insert vector data, and calculate distances between vectors using different operators. It also supports half-precision floating-point numbers for better performance and memory usage optimization.
NeuroSandboxWebUI
A simple and convenient interface for using various neural network models. Users can interact with LLM using text, voice, and image input to generate images, videos, 3D objects, music, and audio. The tool supports a wide range of models for different tasks such as image generation, video generation, audio file separation, voice conversion, and more. Users can also view files from the outputs directory in a gallery, download models, change application settings, and check system sensors. The goal of the project is to create an easy-to-use application for utilizing neural network models.
rag-chat
The `@upstash/rag-chat` package simplifies the development of retrieval-augmented generation (RAG) chat applications by providing Next.js compatibility with streaming support, built-in vector store, optional Redis compatibility for fast chat history management, rate limiting, and disableRag option. Users can easily set up the environment variables and initialize RAGChat to interact with AI models, manage knowledge base, chat history, and enable debugging features. Advanced configuration options allow customization of RAGChat instance with built-in rate limiting, observability via Helicone, and integration with Next.js route handlers and Vercel AI SDK. The package supports OpenAI models, Upstash-hosted models, and custom providers like TogetherAi and Replicate.
obsei
Obsei is an open-source, low-code, AI powered automation tool that consists of an Observer to collect unstructured data from various sources, an Analyzer to analyze the collected data with various AI tasks, and an Informer to send analyzed data to various destinations. The tool is suitable for scheduled jobs or serverless applications as all Observers can store their state in databases. Obsei is still in alpha stage, so caution is advised when using it in production. The tool can be used for social listening, alerting/notification, automatic customer issue creation, extraction of deeper insights from feedbacks, market research, dataset creation for various AI tasks, and more based on creativity.
spandrel
Spandrel is a library for loading and running pre-trained PyTorch models. It automatically detects the model architecture and hyperparameters from model files, and provides a unified interface for running models.
DB-GPT
DB-GPT is a personal database administrator that can solve database problems by reading documents, using various tools, and writing analysis reports. It is currently undergoing an upgrade. **Features:** * **Online Demo:** * Import documents into the knowledge base * Utilize the knowledge base for well-founded Q&A and diagnosis analysis of abnormal alarms * Send feedbacks to refine the intermediate diagnosis results * Edit the diagnosis result * Browse all historical diagnosis results, used metrics, and detailed diagnosis processes * **Language Support:** * English (default) * Chinese (add "language: zh" in config.yaml) * **New Frontend:** * Knowledgebase + Chat Q&A + Diagnosis + Report Replay * **Extreme Speed Version for localized llms:** * 4-bit quantized LLM (reducing inference time by 1/3) * vllm for fast inference (qwen) * Tiny LLM * **Multi-path extraction of document knowledge:** * Vector database (ChromaDB) * RESTful Search Engine (Elasticsearch) * **Expert prompt generation using document knowledge** * **Upgrade the LLM-based diagnosis mechanism:** * Task Dispatching -> Concurrent Diagnosis -> Cross Review -> Report Generation * Synchronous Concurrency Mechanism during LLM inference * **Support monitoring and optimization tools in multiple levels:** * Monitoring metrics (Prometheus) * Flame graph in code level * Diagnosis knowledge retrieval (dbmind) * Logical query transformations (Calcite) * Index optimization algorithms (for PostgreSQL) * Physical operator hints (for PostgreSQL) * Backup and Point-in-time Recovery (Pigsty) * **Continuously updated papers and experimental reports** This project is constantly evolving with new features. Don't forget to star ⭐ and watch 👀 to stay up to date.
educhain
Educhain is a powerful Python package that leverages Generative AI to create engaging and personalized educational content. It enables users to generate multiple-choice questions, create lesson plans, and support various LLM models. Users can export questions to JSON, PDF, and CSV formats, customize prompt templates, and generate questions from text, PDF, URL files, youtube videos, and images. Educhain outperforms traditional methods in content generation speed and quality. It offers advanced configuration options and has a roadmap for future enhancements, including integration with popular Learning Management Systems and a mobile app for content generation on-the-go.
Notate
Notate is a powerful desktop research assistant that combines AI-driven analysis with advanced vector search technology. It streamlines research workflow by processing, organizing, and retrieving information from documents, audio, and text. Notate offers flexible AI capabilities with support for various LLM providers and local models, ensuring data privacy. Built for researchers, academics, and knowledge workers, it features real-time collaboration, accessible UI, and cross-platform compatibility.
ollama
Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications. Ollama is designed to be easy to use and accessible to developers of all levels. It is open source and available for free on GitHub.
BrowserAI
BrowserAI is a tool that allows users to run large language models (LLMs) directly in the browser, providing a simple, fast, and open-source solution. It prioritizes privacy by processing data locally, is cost-effective with no server costs, works offline after initial download, and offers WebGPU acceleration for high performance. It is developer-friendly with a simple API, supports multiple engines, and comes with pre-configured models for easy use. Ideal for web developers, companies needing privacy-conscious AI solutions, researchers experimenting with browser-based AI, and hobbyists exploring AI without infrastructure overhead.
starwhale
Starwhale is an MLOps/LLMOps platform that brings efficiency and standardization to machine learning operations. It streamlines the model development lifecycle, enabling teams to optimize workflows around key areas like model building, evaluation, release, and fine-tuning. Starwhale abstracts Model, Runtime, and Dataset as first-class citizens, providing tailored capabilities for common workflow scenarios including Models Evaluation, Live Demo, and LLM Fine-tuning. It is an open-source platform designed for clarity and ease of use, empowering developers to build customized MLOps features tailored to their needs.
auto-subs
Auto-subs is a tool designed to automatically transcribe editing timelines using OpenAI Whisper and Stable-TS for extreme accuracy. It generates subtitles in a custom style, is completely free, and runs locally within Davinci Resolve. It works on Mac, Linux, and Windows, supporting both Free and Studio versions of Resolve. Users can jump to positions on the timeline using the Subtitle Navigator and translate from any language to English. The tool provides a user-friendly interface for creating and customizing subtitles for video content.
MarkLLM
MarkLLM is an open-source toolkit designed for watermarking technologies within large language models (LLMs). It simplifies access, understanding, and assessment of watermarking technologies, supporting various algorithms, visualization tools, and evaluation modules. The toolkit aids researchers and the community in ensuring the authenticity and origin of machine-generated text.
llm-interface
LLM Interface is an npm module that streamlines interactions with various Large Language Model (LLM) providers in Node.js applications. It offers a unified interface for switching between providers and models, supporting 36 providers and hundreds of models. Features include chat completion, streaming, error handling, extensibility, response caching, retries, JSON output, and repair. The package relies on npm packages like axios, @google/generative-ai, dotenv, jsonrepair, and loglevel. Installation is done via npm, and usage involves sending prompts to LLM providers. Tests can be run using npm test. Contributions are welcome under the MIT License.
Learn_Prompting
Learn Prompting is a platform offering free resources, courses, and webinars to master prompt engineering and generative AI. It provides a Prompt Engineering Guide, courses on Generative AI, workshops, and the HackAPrompt competition. The platform also offers AI Red Teaming and AI Safety courses, research reports on prompting techniques, and welcomes contributions in various forms such as content suggestions, translations, artwork, and typo fixes. Users can locally develop the website using Visual Studio Code, Git, and Node.js, and run it in development mode to preview changes.
For similar tasks
HPT
Hyper-Pretrained Transformers (HPT) is a novel multimodal LLM framework from HyperGAI, trained for vision-language models capable of understanding both textual and visual inputs. The repository contains the open-source implementation of inference code to reproduce the evaluation results of HPT Air on different benchmarks. HPT has achieved competitive results with state-of-the-art models on various multimodal LLM benchmarks. It offers models like HPT 1.5 Air and HPT 1.0 Air, providing efficient solutions for vision-and-language tasks.
learnopencv
LearnOpenCV is a repository containing code for Computer Vision, Deep learning, and AI research articles shared on the blog LearnOpenCV.com. It serves as a resource for individuals looking to enhance their expertise in AI through various courses offered by OpenCV. The repository includes a wide range of topics such as image inpainting, instance segmentation, robotics, deep learning models, and more, providing practical implementations and code examples for readers to explore and learn from.
spark-free-api
Spark AI Free 服务 provides high-speed streaming output, multi-turn dialogue support, AI drawing support, long document interpretation, and image parsing. It offers zero-configuration deployment, multi-token support, and automatic session trace cleaning. It is fully compatible with the ChatGPT interface. The repository includes multiple free-api projects for various AI services. Users can access the API for tasks such as chat completions, AI drawing, document interpretation, image analysis, and ssoSessionId live checking. The project also provides guidelines for deployment using Docker, Docker-compose, Render, Vercel, and native deployment methods. It recommends using custom clients for faster and simpler access to the free-api series projects.
mlx-vlm
MLX-VLM is a package designed for running Vision LLMs on Mac systems using MLX. It provides a convenient way to install and utilize the package for processing large language models related to vision tasks. The tool simplifies the process of running LLMs on Mac computers, offering a seamless experience for users interested in leveraging MLX for vision-related projects.
clarifai-python-grpc
This is the official Clarifai gRPC Python client for interacting with their recognition API. Clarifai offers a platform for data scientists, developers, researchers, and enterprises to utilize artificial intelligence for image, video, and text analysis through computer vision and natural language processing. The client allows users to authenticate, predict concepts in images, and access various functionalities provided by the Clarifai API. It follows a versioning scheme that aligns with the backend API updates and includes specific instructions for installation and troubleshooting. Users can explore the Clarifai demo, sign up for an account, and refer to the documentation for detailed information.
horde-worker-reGen
This repository provides the latest implementation for the AI Horde Worker, allowing users to utilize their graphics card(s) to generate, post-process, or analyze images for others. It offers a platform where users can create images and earn 'kudos' in return, granting priority for their own image generations. The repository includes important details for setup, recommendations for system configurations, instructions for installation on Windows and Linux, basic usage guidelines, and information on updating the AI Horde Worker. Users can also run the worker with multiple GPUs and receive notifications for updates through Discord. Additionally, the repository contains models that are licensed under the CreativeML OpenRAIL License.
geospy
Geospy is a Python tool that utilizes Graylark's AI-powered geolocation service to determine the location where photos were taken. It allows users to analyze images and retrieve information such as country, city, explanation, coordinates, and Google Maps links. The tool provides a seamless way to integrate geolocation services into various projects and applications.
Awesome-Colorful-LLM
Awesome-Colorful-LLM is a meticulously assembled anthology of vibrant multimodal research focusing on advancements propelled by large language models (LLMs) in domains such as Vision, Audio, Agent, Robotics, and Fundamental Sciences like Mathematics. The repository contains curated collections of works, datasets, benchmarks, projects, and tools related to LLMs and multimodal learning. It serves as a comprehensive resource for researchers and practitioners interested in exploring the intersection of language models and various modalities for tasks like image understanding, video pretraining, 3D modeling, document understanding, audio analysis, agent learning, robotic applications, and mathematical research.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.