Ollama-Colab-Integration
Jupyter Notebooks for Ollama integration
Stars: 93
Ollama Colab Integration V4 is a tool designed to enhance the interaction and management of large language models. It allows users to quantize models within their notebook environment, access a variety of models through a user-friendly interface, and manage public endpoints efficiently. The tool also provides features like LiteLLM proxy control, model insights, and customizable model file templating. Users can troubleshoot model loading issues, CPU fallback strategies, and manage VRAM and RAM effectively. Additionally, the tool offers functionalities for downloading model files from Hugging Face, model conversion with high precision, model quantization using Q and Kquants, and securely uploading converted models to Hugging Face.
README:
Will update coming days. Mismatch with orginal versions so will update soon with alot of performance gains. Will be able to run within 2 minutes
Dive into the world of large language models with Ollama Colab Integration V4. This update brings an exciting feature: the ability to quantize models right within your notebook, coupled with the streamlined Ollama Companion, now powered by a Streamlit-based WebUI.
- Run Notebook Cells: Simply run the cells in the provided notebook to set up all dependencies automatically. It's designed for a hassle-free setup experience, perfect for both beginners and seasoned users.
- Get Public URL: Upon loading, you'll receive a public URL. This URL grants you access to the Ollama-Companion, where you can interact with various language models and leverage the tool's full potential.
- Seamless Quantization: Perform model quantization directly in your notebook environment.
- Integrated Streamlit UI: Experience an intuitive interaction with models through the Streamlit-based Ollama Companion.
- Secure Cloudflared Tunneling: Create endpoints independently and securely.
- Accessible Model Library: Easily access a wide range of models via a user-friendly interface.
- Customizable ModelFile Templater: Tailor model parameters to your requirements.
- In-depth Model Insights: Obtain detailed information about model specifications and licensing.
- Efficient Public Endpoint Management: Manage your public endpoints for both original and OpenAI models with ease.
- LiteLLM Proxy Control: Directly manage LiteLLM proxy and its automated polling.
- Utility Tools: Additional features include CURL command creation and manual model setup.
- Model Loading Issues: Tips for handling GPU crashes with large models.
- CPU Fallback Strategy: Guidelines for reverting to CPU post-crash.
- VRAM and RAM Management: Best practices for managing VRAM and RAM limitations.
- Kaggle for Enhanced Performance: Using Kaggle for better VRAM and RAM capabilities.
Contributions to Ollama Colab Integration V4 are always welcome. Enhance, suggest, and report to help us improve.
This Notebook git clones from https://github.com/Luxadevi/Ollama-Companion branch Colab-installer for its optimized installation file.
Want to un Ollama-Companion on your Mac, Windows or Linux machine, download from Ollama-Companion GitHub Repository
Ollama-Companion is developed to enhance the interaction and management of Ollama and other large language model (LLM) applications. It aims to support all Ollama API endpoints, facilitate model conversion, and ensure seamless connectivity, even in environments behind NAT. This tool is crafted to construct a versatile and user-friendly LLM software stack, meeting a diverse range of user requirements.
Transitioning from Gradio to Streamlit necessitated the development of new tunneling methods to maintain compatibility with Jupyter Notebooks, like Google Colab.
Explore our Colab Integration to set up the companion within minutes and obtain a public-facing URL.
Interact with Ollama API without typing commands and using a interface to manage your models. Run Ollama or connect to a client an use this WebUI to manage.
Ollama-Companion, developed for enhancing the interaction and management of Ollama and other large language model (LLM) applications, now features Streamlit integration. This tool aims to support all Ollama API endpoints, facilitate model conversion, and ensure seamless connectivity, even in environments behind NAT. Transitioning from Gradio to Streamlit has led to the development of new tunneling methods, maintaining compatibility with Jupyter Notebooks like Google Colab.
Explore our Colab Integration and set up the companion within minutes to obtain a public-facing URL for accessing Ollama-Companion. Visit the Ollama-Companion GitHub page for more details and repository access.
Develop your own Streamlit components and integrate them into Ollama-Companion. See examples using LangChain and other software stacks within Streamlit. management. You can also manage a remote Ollama instance by setting the Ollama endpoint in the UI.
Develop your own Streamlit components and integrate them into Ollama-Companion. See examples using LangChain and other software stacks within Streamlit.
This part allows you to manage and interact with the LiteLLM Proxy, which is used to convert over 100 LLM providers to the OpenAI API standard.
Check LiteLLM out at LiteLLM proxy
- Start LiteLLM Proxy: Click this button to start the LiteLLM Proxy. The proxy will run in the background and facilitate the conversion process.
- Read LiteLLM Log: Use this button to read the LiteLLM Proxy log, which contains relevant information about its operation.
- Start Polling: Click to initiate polling. Polling checks for updates to the ollama API and adds any new models to the configuration.
- Stop Polling: Use this button to stop polling for updates.
- Kill Existing LiteLLM Processes: If there are existing LiteLLM processes running, this button will terminate them.
- Free Up Port 8000: Click this button to free up port 8000 if it's currently in use.
Please note that starting the LiteLLM Proxy and performing other actions may take some time, so be patient and wait for the respective success messages.
The "Log Output" section will display relevant information from the LiteLLM Proxy log, providing insights into its operation and status.
To download model files from Hugging Face, follow these steps:
-
Visit the Model Page: Go to the Hugging Face model page you wish to download. For example: Mistralai/Mistral-7B-Instruct-v0.2.
-
Copy Username/RepositoryName: On the model page, locate the icon next to the username of the model's author (usually a clipboard or copy symbol). Click to copy the Username/RepositoryName, e.g.,
mistralai/Mistral-7B-Instruct-v0.2
. -
Paste in the Input Field: Paste the copied Username/RepositoryName directly into the designated input field in your application.
-
Get File List: Click the "Get file list" button to retrieve a list of available files in this repository.
-
Review File List: Ensure the list contains the correct model files you wish to download.
-
Download Model: Click the "Download Model" button to start the download process for the selected model files.
-
File Storage: The model files will be saved in the
llama.cpp/models
directory on your device.
By following these steps, you have successfully downloaded the model files from Hugging Face, and they are now stored in the llama.cpp/models
directory for your use.
-
Select a Model Folder: Choose a folder within
llama.cpp/models
that contains the model you wish to convert. -
Set Conversion Options: Select your desired conversion options from the provided checkboxes, F32 F16 or Q8_0.
-
Docker Container Option: Optionally, use a Docker container for added flexibility and compatibility.
-
Execute Conversion: Click the "Run Commands" button to start the conversion process.
-
Output Location: Converted models will be saved in the
High-Precision-Quantization
subfolder within the selected model folder.
Utilize this process to efficiently convert models while maintaining high precision and compatibility with llama.cpp
.
-
Select GGUF File: Choose the GGUF file you wish to quantize from the dropdown list.
-
Quantization Options: Check the boxes next to the quantization options you want to apply (Q, Kquants).
-
Execution Environment: Choose to use either the native
llama.cpp
or a Docker container for compatibility. -
Run Quantization: Click the "Run Selected Commands" button to schedule and execute the quantization tasks.
-
Save Location: The quantized models will be saved in the
/modelname/Medium-Precision-Quantization
folder.
Follow these steps to perform model quantization using Q and Kquants, saving the quantized models in the specified directory. Schedule multiple options in a row they will remember and run eventually.
Use this section to securely upload your converted models to Hugging Face.
-
Select a Model: Choose a model from the dropdown list. These models are located in the
llama.cpp/models
directory. -
Enter Repository Name: Specify a name for the new Hugging Face repository where your model will be uploaded.
-
Choose Files for Upload: Select the files you wish to upload from the subfolders of the chosen model.
-
Add README Content: Optionally, write content for the README.md file of your new repository.
- For enhanced security, use an encrypted token. Encrypt your Hugging Face token on the Token Encrypt page and enter it in the "Enter Encrypted Token" field.
- Alternatively, enter an unencrypted Hugging Face token directly.
- Upload Files: Click the "Upload Selected Files" button to initiate the upload to Hugging Face.
After completing these steps, your uploaded models will be accessible at https://huggingface.co/your-username/your-repo-name
.
- Intuitive and Responsive UI
- Advanced Modelfile Management
- Dynamic UI Building Blocks
- Download and Convert PyTorch Models from Huggingface
- Multiple Format Conversion Options
- Easy API Connectivity via Secure Tunnels
- Options for Sharing and Cloud Testing
- Accessible from Any Network Setup
- Easy Model Upload to Huggingface
- Capability to Queue Multiple Workloads
- Integrated LLAVA Image Analysis
- Configurable Security Features
- Advanced Token Encryption
We are dedicated to the continuous enhancement of Ollama-Companion, with a focus on user experience and expanded functionality.
Check the docs for more information
Licensed under the Apache License.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Ollama-Colab-Integration
Similar Open Source Tools
Ollama-Colab-Integration
Ollama Colab Integration V4 is a tool designed to enhance the interaction and management of large language models. It allows users to quantize models within their notebook environment, access a variety of models through a user-friendly interface, and manage public endpoints efficiently. The tool also provides features like LiteLLM proxy control, model insights, and customizable model file templating. Users can troubleshoot model loading issues, CPU fallback strategies, and manage VRAM and RAM effectively. Additionally, the tool offers functionalities for downloading model files from Hugging Face, model conversion with high precision, model quantization using Q and Kquants, and securely uploading converted models to Hugging Face.
extensionOS
Extension | OS is an open-source browser extension that brings AI directly to users' web browsers, allowing them to access powerful models like LLMs seamlessly. Users can create prompts, fix grammar, and access intelligent assistance without switching tabs. The extension aims to revolutionize online information interaction by integrating AI into everyday browsing experiences. It offers features like Prompt Factory for tailored prompts, seamless LLM model access, secure API key storage, and a Mixture of Agents feature. The extension was developed to empower users to unleash their creativity with custom prompts and enhance their browsing experience with intelligent assistance.
Simplifine
Simplifine is an open-source library designed for easy LLM finetuning, enabling users to perform tasks such as supervised fine tuning, question-answer finetuning, contrastive loss for embedding tasks, multi-label classification finetuning, and more. It provides features like WandB logging, in-built evaluation tools, automated finetuning parameters, and state-of-the-art optimization techniques. The library offers bug fixes, new features, and documentation updates in its latest version. Users can install Simplifine via pip or directly from GitHub. The project welcomes contributors and provides comprehensive documentation and support for users.
llm-twin-course
The LLM Twin Course is a free, end-to-end framework for building production-ready LLM systems. It teaches you how to design, train, and deploy a production-ready LLM twin of yourself powered by LLMs, vector DBs, and LLMOps good practices. The course is split into 11 hands-on written lessons and the open-source code you can access on GitHub. You can read everything and try out the code at your own pace.
LLMstudio
LLMstudio by TensorOps is a platform that offers prompt engineering tools for accessing models from providers like OpenAI, VertexAI, and Bedrock. It provides features such as Python Client Gateway, Prompt Editing UI, History Management, and Context Limit Adaptability. Users can track past runs, log costs and latency, and export history to CSV. The tool also supports automatic switching to larger-context models when needed. Coming soon features include side-by-side comparison of LLMs, automated testing, API key administration, project organization, and resilience against rate limits. LLMstudio aims to streamline prompt engineering, provide execution history tracking, and enable effortless data export, offering an evolving environment for teams to experiment with advanced language models.
gemini-android
Gemini Android is a repository showcasing Google's Generative AI on Android using Stream Chat SDK for Compose. It demonstrates the Gemini API for Android, implements UI elements with Jetpack Compose, utilizes Android architecture components like Hilt and AppStartup, performs background tasks with Kotlin Coroutines, and integrates chat systems with Stream Chat Compose SDK for real-time event handling. The project also provides technical content, instructions on building the project, tech stack details, architecture overview, modularization strategies, and a contribution guideline. It follows Google's official architecture guidance and offers a real-world example of app architecture implementation.
miniperplx
MiniPerplx is a minimalistic AI-powered search engine designed to help users find information on the internet. It utilizes AI technologies from providers like OpenAI, Anthropic, and Tavily to deliver accurate and relevant search results. Users can deploy their own instance of MiniPerplx by obtaining API keys, setting up environment variables, and running the development server. The tool aims to streamline the process of information retrieval by leveraging advanced AI capabilities in a user-friendly interface.
ai-driven-dev-community
AI Driven Dev Community is a repository aimed at helping developers become more efficient by utilizing AI tools in their daily coding tasks. It provides a collection of tools, prompts, snippets, and agents for developers to integrate AI into their workflow. The repository is regularly updated with new resources and focuses on best practices for using AI in development work. Users can find tools like Espanso, ChatGPT, GitHub Copilot, and VSCode recommended for enhancing their coding experience. Additionally, the repository offers guidance on customizing AI for developers, installing AI toolbox for software engineers, and contributing to the community through easy steps.
llm-answer-engine
This repository contains the code and instructions needed to build a sophisticated answer engine that leverages the capabilities of Groq, Mistral AI's Mixtral, Langchain.JS, Brave Search, Serper API, and OpenAI. Designed to efficiently return sources, answers, images, videos, and follow-up questions based on user queries, this project is an ideal starting point for developers interested in natural language processing and search technologies.
AgentForge
AgentForge is a low-code framework tailored for the rapid development, testing, and iteration of AI-powered autonomous agents and Cognitive Architectures. It is compatible with a range of LLM models and offers flexibility to run different models for different agents based on specific needs. The framework is designed for seamless extensibility and database-flexibility, making it an ideal playground for various AI projects. AgentForge is a beta-testing ground and future-proof hub for crafting intelligent, model-agnostic autonomous agents.
Instrukt
Instrukt is a terminal-based AI integrated environment that allows users to create and instruct modular AI agents, generate document indexes for question-answering, and attach tools to any agent. It provides a platform for users to interact with AI agents in natural language and run them inside secure containers for performing tasks. The tool supports custom AI agents, chat with code and documents, tools customization, prompt console for quick interaction, LangChain ecosystem integration, secure containers for agent execution, and developer console for debugging and introspection. Instrukt aims to make AI accessible to everyone by providing tools that empower users without relying on external APIs and services.
ROSGPT_Vision
ROSGPT_Vision is a new robotic framework designed to command robots using only two prompts: a Visual Prompt for visual semantic features and an LLM Prompt to regulate robotic reactions. It is based on the Prompting Robotic Modalities (PRM) design pattern and is used to develop CarMate, a robotic application for monitoring driver distractions and providing real-time vocal notifications. The framework leverages state-of-the-art language models to facilitate advanced reasoning about image data and offers a unified platform for robots to perceive, interpret, and interact with visual data through natural language. LangChain is used for easy customization of prompts, and the implementation includes the CarMate application for driver monitoring and assistance.
JamAIBase
JamAI Base is an open-source platform integrating SQLite and LanceDB databases with managed memory and RAG capabilities. It offers built-in LLM, vector embeddings, and reranker orchestration accessible through a spreadsheet-like UI and REST API. Users can transform static tables into dynamic entities, facilitate real-time interactions, manage structured data, and simplify chatbot development. The tool focuses on ease of use, scalability, flexibility, declarative paradigm, and innovative RAG techniques, making complex data operations accessible to users with varying technical expertise.
Vodalus-Expert-LLM-Forge
Vodalus Expert LLM Forge is a tool designed for crafting datasets and efficiently fine-tuning models using free open-source tools. It includes components for data generation, LLM interaction, RAG engine integration, model training, fine-tuning, and quantization. The tool is suitable for users at all levels and is accompanied by comprehensive documentation. Users can generate synthetic data, interact with LLMs, train models, and optimize performance for local execution. The tool provides detailed guides and instructions for setup, usage, and customization.
yn
Yank Note is a highly extensible Markdown editor designed for productivity. It offers features like easy-to-use interface, powerful support for version control and various embedded content, high compatibility with local Markdown files, plug-in extension support, and encryption for saving private files. Users can write their own plug-ins to expand the editor's functionality. However, for more extendability, security protection is sacrificed. The tool supports sync scrolling, outline navigation, version control, encryption, auto-save, editing assistance, image pasting, attachment embedding, code running, to-do list management, quick file opening, integrated terminal, Katex expression, GitHub-style Markdown, multiple data locations, external link conversion, HTML resolving, multiple formats export, TOC generation, table cell editing, title link copying, embedded applets, various graphics embedding, mind map display, custom container support, macro replacement, image hosting service, OpenAI auto completion, and custom plug-ins development.
clearml-server
ClearML Server is a backend service infrastructure for ClearML, facilitating collaboration and experiment management. It includes a web app, RESTful API, and file server for storing images and models. Users can deploy ClearML Server using Docker, AWS EC2 AMI, or Kubernetes. The system design supports single IP or sub-domain configurations with specific open ports. ClearML-Agent Services container allows launching long-lasting jobs and various use cases like auto-scaler service, controllers, optimizer, and applications. Advanced functionality includes web login authentication and non-responsive experiments watchdog. Upgrading ClearML Server involves stopping containers, backing up data, downloading the latest docker-compose.yml file, configuring ClearML-Agent Services, and spinning up docker containers. Community support is available through ClearML FAQ, Stack Overflow, GitHub issues, and email contact.
For similar tasks
Ollama-Colab-Integration
Ollama Colab Integration V4 is a tool designed to enhance the interaction and management of large language models. It allows users to quantize models within their notebook environment, access a variety of models through a user-friendly interface, and manage public endpoints efficiently. The tool also provides features like LiteLLM proxy control, model insights, and customizable model file templating. Users can troubleshoot model loading issues, CPU fallback strategies, and manage VRAM and RAM effectively. Additionally, the tool offers functionalities for downloading model files from Hugging Face, model conversion with high precision, model quantization using Q and Kquants, and securely uploading converted models to Hugging Face.
rknn-llm
RKLLM software stack is a toolkit designed to help users quickly deploy AI models to Rockchip chips. It consists of RKLLM-Toolkit for model conversion and quantization, RKLLM Runtime for deploying models on Rockchip NPU platform, and RKNPU kernel driver for hardware interaction. The toolkit supports RK3588 and RK3576 series chips and various models like TinyLLAMA, Qwen, Phi, ChatGLM3, Gemma, InternLM2, and MiniCPM. Users can download packages, docker images, examples, and docs from RKLLM_SDK. Additionally, RKNN-Toolkit2 SDK is available for deploying additional AI models.
LLMinator
LLMinator is a Gradio-based tool with an integrated chatbot designed to locally run and test Language Model Models (LLMs) directly from HuggingFace. It provides an easy-to-use interface made with Gradio, LangChain, and Torch, offering features such as context-aware streaming chatbot, inbuilt code syntax highlighting, loading any LLM repo from HuggingFace, support for both CPU and CUDA modes, enabling LLM inference with llama.cpp, and model conversion capabilities.
xFasterTransformer
xFasterTransformer is an optimized solution for Large Language Models (LLMs) on the X86 platform, providing high performance and scalability for inference on mainstream LLM models. It offers C++ and Python APIs for easy integration, along with example codes and benchmark scripts. Users can prepare models in a different format, convert them, and use the APIs for tasks like encoding input prompts, generating token ids, and serving inference requests. The tool supports various data types and models, and can run in single or multi-rank modes using MPI. A web demo based on Gradio is available for popular LLM models like ChatGLM and Llama2. Benchmark scripts help evaluate model inference performance quickly, and MLServer enables serving with REST and gRPC interfaces.
ai-edge-torch
AI Edge Torch is a Python library that supports converting PyTorch models into a .tflite format for on-device applications on Android, iOS, and IoT devices. It offers broad CPU coverage with initial GPU and NPU support, closely integrating with PyTorch and providing good coverage of Core ATen operators. The library includes a PyTorch converter for model conversion and a Generative API for authoring mobile-optimized PyTorch Transformer models, enabling easy deployment of Large Language Models (LLMs) on mobile devices.
BodhiApp
Bodhi App runs Open Source Large Language Models locally, exposing LLM inference capabilities as OpenAI API compatible REST APIs. It leverages llama.cpp for GGUF format models and huggingface.co ecosystem for model downloads. Users can run fine-tuned models for chat completions, create custom aliases, and convert Huggingface models to GGUF format. The CLI offers commands for environment configuration, model management, pulling files, serving API, and more.
lm.rs
lm.rs is a tool that allows users to run inference on Language Models locally on the CPU using Rust. It supports LLama3.2 1B and 3B models, with a WebUI also available. The tool provides benchmarks and download links for models and tokenizers, with recommendations for quantization options. Users can convert models from Google/Meta on huggingface using provided scripts. The tool can be compiled with cargo and run with various arguments for model weights, tokenizer, temperature, and more. Additionally, a backend for the WebUI can be compiled and run to connect via the web interface.
mlflow
MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow offers a set of lightweight APIs that can be used with any existing machine learning application or library (TensorFlow, PyTorch, XGBoost, etc), wherever you currently run ML code (e.g. in notebooks, standalone applications or the cloud). MLflow's current components are:
* `MLflow Tracking
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.