
deeppowers
An open-source, high-performance inference accelerator engine supporting various large language models such as DEEPSEEK, GPT, GEMINI, and CLAUDE.
Stars: 183

Deeppowers is a powerful Python library for deep learning applications. It provides a wide range of tools and utilities to simplify the process of building and training deep neural networks. With Deeppowers, users can easily create complex neural network architectures, perform efficient training and optimization, and deploy models for various tasks. The library is designed to be user-friendly and flexible, making it suitable for both beginners and experienced deep learning practitioners.
README:
An open-source, high-performance inference accelerator engine supporting various large language models such as DEEPSEEK, GPT, GEMINI, and CLAUDE.
DeepPowers is a high-performance tokenizer implementation with memory optimization and parallel processing capabilities. It provides efficient text tokenization for large language models with features like WordPiece and BPE algorithms, memory pooling, and batch processing.
- Hardware Abstraction Layer (HAL)
- CUDA device management
- Basic tensor operations
- Kernel management system
- Request queue management
- Batch processing system
- Priority scheduling
- Error handling mechanism
- INT8 quantization support
- INT4 quantization support
- Mixed-precision quantization
- Calibration data management
- C++ API infrastructure
- Python bindings
- REST API infrastructure
- gRPC service infrastructure
- Authentication middleware
- Rate limiting middleware
- Logging middleware
- Monitoring middleware
- Error handling middleware
- Model operator fusion
- Weight pruning techniques
- KV-cache optimization
- Automatic optimization selection
- Performance profiling and benchmarking
The architecture follows a pipeline-based design with several key components:
-
Request Flow
- User requests enter the system through a unified interface
- Requests are queued and prioritized in the Request Queue
- Batching system groups compatible requests for optimal processing
- Execution Engine processes batches and generates results
- Output is post-processed and returned to users
-
Control Flow
- Configuration Manager oversees system settings and runtime parameters
- Graph Compiler optimizes computation graphs for execution
- Hardware Abstraction Layer provides unified access to different hardware backends
-
Optimization Points
- Dynamic batching for throughput optimization
- Graph compilation for computation optimization
- Hardware-specific optimizations through HAL
- Configuration-based performance tuning
deeppowers/
├── src/
│ ├── core/ # Core implementation
│ │ ├── hal/ # Hardware Abstraction Layer for device management
│ │ ├── request_queue/ # Request queue and management system
│ │ ├── batching/ # Batch processing and optimization
│ │ ├── execution/ # Execution engine and runtime
│ │ ├── distributed/ # Distributed computing support
│ │ ├── scheduling/ # Task scheduling and resource management
│ │ ├── monitoring/ # System monitoring and metrics
│ │ ├── config/ # Configuration management
│ │ ├── preprocessing/ # Input preprocessing pipeline
│ │ ├── postprocessing/ # Output postprocessing pipeline
│ │ ├── graph/ # Computation graph management
│ │ ├── api/ # Internal API implementations
│ │ ├── model/ # Base model architecture
│ │ ├── memory/ # Memory management system
│ │ ├── inference/ # Inference engine core
│ │ ├── models/ # Specific model implementations
│ │ ├── tokenizer/ # Tokenization implementations
│ │ └── utils/ # Utility components
│ ├── api/ # External API implementations
│ └── common/ # Common utilities
├── tests/ # Test suite
├── scripts/ # Utility scripts
├── examples/ # Example usage
├── docs/ # Documentation
└── README.md # Project overview
The core module is organized into specialized components:
- HAL (Hardware Abstraction Layer): Manages hardware devices and provides unified interface for different backends
- Request Queue: Handles incoming requests with priority management and load balancing
- Batching: Implements dynamic batching strategies for optimal throughput
- Execution: Core execution engine for model inference
- Distributed: Supports distributed computing and model parallelism
- Scheduling: Manages task scheduling and resource allocation
- Monitoring: System metrics collection and performance monitoring
- Config: Configuration management and validation
- Memory: Advanced memory management and optimization
- Preprocessing: Input data preparation and normalization
- Postprocessing: Output processing and formatting
- Graph: Computation graph optimization and management
- Inference: Core inference engine implementation
- Model: Base model architecture and interfaces
- Models: Specific model implementations (GPT, BERT, etc.)
- Tokenizer: Text tokenization algorithms and utilities
- API: Internal API implementations for core functionality
- Utils: Common utilities and helper functions
- C++17 compiler
- CMake 3.15+
- Python 3.8+ (for Python bindings)
- ICU library for Unicode support
# Install dependencies (Ubuntu)
sudo apt-get install build-essential cmake libicu-dev
# Clone and build
git clone https://github.com/deeppowers/deeppowers.git
cd deeppowers
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j
# Install Python package (optional)
cd ./src/api/python
pip install -e .
# Clone the repository
git clone https://github.com/deeppowers/deeppowers.git
cd deeppowers
# Install dependencies
pip install -r requirements.txt
# Build from source
mkdir build && cd build
cmake ..
make -j$(nproc)
import deeppowers as dp
# Method 1: Using Pipeline (Recommended)
# Initialize pipeline with pre-trained model
pipeline = dp.Pipeline.from_pretrained("deepseek-v3")
# Generate text
response = pipeline.generate(
"Hello, how are you?",
max_length=50,
temperature=0.7,
top_p=0.9
)
print(response)
# Batch processing
responses = pipeline.generate(
["Hello!", "How are you?"],
max_length=50,
temperature=0.7
)
# Save and load pipeline
pipeline.save("my_pipeline")
loaded_pipeline = dp.Pipeline.load("my_pipeline")
# Method 2: Using Tokenizer and Model separately
# Initialize tokenizer
tokenizer = dp.Tokenizer(model_name="deepseek-v3") # or use custom vocab
tokenizer.load("path/to/tokenizer.model")
# Initialize model
model = dp.Model.from_pretrained("deepseek-v3")
# Create pipeline manually
pipeline = dp.Pipeline(model=model, tokenizer=tokenizer)
# Initialize tokenizer with specific type
tokenizer = dp.Tokenizer(tokenizer_type=dp.TokenizerType.WORDPIECE)
# Train on custom data
texts = ["your", "training", "texts"]
tokenizer.train(texts, vocab_size=30000, min_frequency=2)
# Save and load
tokenizer.save("tokenizer.model")
tokenizer.load("tokenizer.model")
# Basic tokenization
tokens = tokenizer.encode("Hello, world!")
text = tokenizer.decode(tokens)
# Batch processing with parallel execution
texts = ["multiple", "texts", "for", "processing"]
tokens_batch = tokenizer.encode_batch(
texts,
add_special_tokens=True,
padding=True,
max_length=128
)
# Configure generation parameters
response = pipeline.generate(
"Write a story about",
max_length=200, # Maximum length of generated text
min_length=50, # Minimum length of generated text
temperature=0.7, # Controls randomness (higher = more random)
top_k=50, # Limits vocabulary to top k tokens
top_p=0.9, # Nucleus sampling threshold
num_return_sequences=3, # Number of different sequences to generate
repetition_penalty=1.2 # Penalize repeated tokens
)
# Batch generation with multiple prompts
prompts = [
"Write a story about",
"Explain quantum physics",
"Give me a recipe for"
]
responses = pipeline.generate(
prompts,
max_length=100,
temperature=0.8
)
# Load model
model = dp.load_model("deepseek-v3", device="cuda", dtype="float16")
# Apply automatic optimization
results = dp.optimize_model(model, optimization_type="auto", level="o2", enable_profiling=True)
print(f"Achieved speedup: {results['speedup']}x")
print(f"Memory reduction: {results['memory_reduction']}%")
# Apply specific optimization techniques
results = dp.optimize_model(model, optimization_type="fusion")
results = dp.optimize_model(model, optimization_type="pruning")
results = dp.optimize_model(model, optimization_type="caching")
# Quantize model to INT8 precision
results = dp.quantize_model(model, precision="int8")
print(f"INT8 quantization speedup: {results['speedup']}x")
print(f"Accuracy loss: {results['accuracy_loss']}%")
# Run benchmarks
benchmark_results = dp.benchmark_model(
model,
input_text="This is a test input for benchmarking.",
num_runs=10,
warmup_runs=3
)
print(f"Average latency: {benchmark_results['avg_latency_ms']} ms")
print(f"Throughput: {benchmark_results['throughput_tokens_per_sec']} tokens/sec")
# Configure memory pool
tokenizer.set_memory_pool_size(4096) # 4KB blocks
tokenizer.enable_string_pooling(True)
# Monitor memory usage
stats = tokenizer.get_memory_stats()
print(f"Memory pool usage: {stats['pool_usage']}MB")
print(f"String pool size: {stats['string_pool_size']}")
# Configure thread pool
tokenizer.set_num_threads(8)
tokenizer.set_batch_size(64)
# Process large datasets
with open("large_file.txt", "r") as f:
texts = f.readlines()
tokens = tokenizer.encode_batch_parallel(texts)
DeepPowers includes several performance optimization features:
- Memory pooling and caching
- Dynamic batching
- Parallel processing
- Mixed-precision computation
- Distributed inference
- Model quantization (INT8, INT4, mixed precision)
- Operator fusion
- KV-cache optimization
- ✅ Hardware Abstraction Layer (HAL) - Basic CUDA and ROCM support
- ✅ Tokenizer Implementation - WordPiece and BPE algorithms
- ✅ Memory Management - Basic memory pooling system
- ✅ Request Queue Management - Basic request handling
- ✅ Configuration System - Basic config management
- ✅ Python Bindings - Basic API interface
- ✅ Monitoring System - Basic metrics collection
- ✅ Model Execution Framework - Core implementation
- ✅ Inference Pipeline - Basic pipeline structure
- ✅ Dynamic Batch Processing - Initial implementation
- ✅ Model Loading System - Support for ONNX, PyTorch, and TensorFlow formats
- ✅ Inference Engine - Complete implementation with text generation capabilities
- ✅ Inference Optimization - Operator fusion, weight pruning, and caching optimizations
- ✅ Quantization System - INT8, INT4, and mixed precision support
- ✅ Benchmarking Tools - Performance measurement and optimization metrics
- ✅ Streaming Generation - Real-time text generation with callback support
- ✅ Advanced Batching Strategies - Batch processing with parallel inference
- 🔄 Computation Graph System - Advanced graph optimizations
- 🔄 Distributed Computing Support - Multi-node inference
- 🔄 Auto-tuning System - Automatic performance optimization
- 🔄 Dynamic Shape Support - Flexible tensor dimensions handling
- 📋 Advanced Model Support
- Advanced LLM implementations (GPT, Gemini, Claude)
- More sophisticated model architecture support
- Model compression and distillation
- 📋 Performance Optimization
- Advanced memory management
- Kernel fusion optimizations
- Custom CUDA kernels for critical operations
- 📋 Advanced Features
- Multi-GPU parallelism
- Distributed inference across nodes
- Advanced caching system with prefetching
- Speculative decoding
- Custom operator implementation
The project includes comprehensive benchmarking tools:
# Run performance benchmark
python examples/optimize_and_benchmark.py --model your-model --benchmark
# Run with optimization
python examples/optimize_and_benchmark.py --model your-model --optimization auto --level o2 --benchmark
# Apply quantization
python examples/optimize_and_benchmark.py --model your-model --quantize --quantize-precision int8 --benchmark
# Generate text with optimized model
python examples/optimize_and_benchmark.py --model your-model --optimization auto --generate --prompt "Your prompt here"
We welcome contributions! Please see our Contributing Guidelines for details.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Special thanks to all contributors and the open-source community.
- GitHub Issues: Create an issue
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for deeppowers
Similar Open Source Tools

deeppowers
Deeppowers is a powerful Python library for deep learning applications. It provides a wide range of tools and utilities to simplify the process of building and training deep neural networks. With Deeppowers, users can easily create complex neural network architectures, perform efficient training and optimization, and deploy models for various tasks. The library is designed to be user-friendly and flexible, making it suitable for both beginners and experienced deep learning practitioners.

magma
Magma is a powerful and flexible framework for building scalable and efficient machine learning pipelines. It provides a simple interface for creating complex workflows, enabling users to easily experiment with different models and data processing techniques. With Magma, users can streamline the development and deployment of machine learning projects, saving time and resources.

sciml.ai
SciML.ai is an open source software organization dedicated to unifying packages for scientific machine learning. It focuses on developing modular scientific simulation support software, including differential equation solvers, inverse problems methodologies, and automated model discovery. The organization aims to provide a diverse set of tools with a common interface, creating a modular, easily-extendable, and highly performant ecosystem for scientific simulations. The website serves as a platform to showcase SciML organization's packages and share news within the ecosystem. Pull requests are encouraged for contributions.

AI_Spectrum
AI_Spectrum is a versatile machine learning library that provides a wide range of tools and algorithms for building and deploying AI models. It offers a user-friendly interface for data preprocessing, model training, and evaluation. With AI_Spectrum, users can easily experiment with different machine learning techniques and optimize their models for various tasks. The library is designed to be flexible and scalable, making it suitable for both beginners and experienced data scientists.

ai-workshop-code
The ai-workshop-code repository contains code examples and tutorials for various artificial intelligence concepts and algorithms. It serves as a practical resource for individuals looking to learn and implement AI techniques in their projects. The repository covers a wide range of topics, including machine learning, deep learning, natural language processing, computer vision, and reinforcement learning. By exploring the code and following the tutorials, users can gain hands-on experience with AI technologies and enhance their understanding of how these algorithms work in practice.

atomic-agents
The Atomic Agents framework is a modular and extensible tool designed for creating powerful applications. It leverages Pydantic for data validation and serialization. The framework follows the principles of Atomic Design, providing small and single-purpose components that can be combined. It integrates with Instructor for AI agent architecture and supports various APIs like Cohere, Anthropic, and Gemini. The tool includes documentation, examples, and testing features to ensure smooth development and usage.

God-Level-AI
A drill of scientific methods, processes, algorithms, and systems to build stories & models. An in-depth learning resource for humans. This repository is designed for individuals aiming to excel in the field of Data and AI, providing video sessions and text content for learning. It caters to those in leadership positions, professionals, and students, emphasizing the need for dedicated effort to achieve excellence in the tech field. The content covers various topics with a focus on practical application.

stable-diffusion.cpp
The stable-diffusion.cpp repository provides an implementation for inferring stable diffusion in pure C/C++. It offers features such as support for different versions of stable diffusion, lightweight and dependency-free implementation, various quantization support, memory-efficient CPU inference, GPU acceleration, and more. Users can download the built executable program or build it manually. The repository also includes instructions for downloading weights, building from scratch, using different acceleration methods, running the tool, converting weights, and utilizing various features like Flash Attention, ESRGAN upscaling, PhotoMaker support, and more. Additionally, it mentions future TODOs and provides information on memory requirements, bindings, UIs, contributors, and references.

open-ai
Open AI is a powerful tool for artificial intelligence research and development. It provides a wide range of machine learning models and algorithms, making it easier for developers to create innovative AI applications. With Open AI, users can explore cutting-edge technologies such as natural language processing, computer vision, and reinforcement learning. The platform offers a user-friendly interface and comprehensive documentation to support users in building and deploying AI solutions. Whether you are a beginner or an experienced AI practitioner, Open AI offers the tools and resources you need to accelerate your AI projects and stay ahead in the rapidly evolving field of artificial intelligence.

nmed2024
Nmed2024 is a GitHub repository that contains code for a neural network model designed for medical image analysis. The repository includes scripts for training the model, as well as pre-trained weights for quick deployment. The model is specifically tailored for detecting abnormalities in medical images, such as tumors or fractures. It utilizes deep learning techniques to achieve high accuracy and can be easily integrated into existing medical imaging systems. Researchers and developers in the healthcare industry can leverage this tool to enhance the efficiency and accuracy of medical image analysis tasks.

AimRT
AimRT is a basic runtime framework for modern robotics, developed in modern C++ with lightweight and easy deployment. It integrates research and development for robot applications in various deployment scenarios, providing debugging tools and observability support. AimRT offers a plug-in development interface compatible with ROS2, HTTP, Grpc, and other ecosystems for progressive system upgrades.

chatmcp
Chatmcp is a chatbot framework for building conversational AI applications. It provides a flexible and extensible platform for creating chatbots that can interact with users in a natural language. With Chatmcp, developers can easily integrate chatbot functionality into their applications, enabling users to communicate with the system through text-based conversations. The framework supports various natural language processing techniques and allows for the customization of chatbot behavior and responses. Chatmcp simplifies the development of chatbots by providing a set of pre-built components and tools that streamline the creation process. Whether you are building a customer support chatbot, a virtual assistant, or a chat-based game, Chatmcp offers the necessary features and capabilities to bring your conversational AI ideas to life.

swirl-search
Swirl is an open-source software that allows users to simultaneously search multiple content sources and receive AI-ranked results. It connects to various data sources, including databases, public data services, and enterprise sources, and utilizes AI and LLMs to generate insights and answers based on the user's data. Swirl is easy to use, requiring only the download of a YML file, starting in Docker, and searching with Swirl. Users can add credentials to preloaded SearchProviders to access more sources. Swirl also offers integration with ChatGPT as a configured AI model. It adapts and distributes user queries to anything with a search API, re-ranking the unified results using Large Language Models without extracting or indexing anything. Swirl includes five Google Programmable Search Engines (PSEs) to get users up and running quickly. Key features of Swirl include Microsoft 365 integration, SearchProvider configurations, query adaptation, synchronous or asynchronous search federation, optional subscribe feature, pipelining of Processor stages, results stored in SQLite3 or PostgreSQL, built-in Query Transformation support, matching on word stems and handling of stopwords, duplicate detection, re-ranking of unified results using Cosine Vector Similarity, result mixers, page through all results requested, sample data sets, optional spell correction, optional search/result expiration service, easily extensible Connector and Mixer objects, and a welcoming community for collaboration and support.

aiounifi
Aiounifi is a Python library that provides a simple interface for interacting with the Unifi Controller API. It allows users to easily manage their Unifi network devices, such as access points, switches, and gateways, through automated scripts or applications. With Aiounifi, users can retrieve device information, perform configuration changes, monitor network performance, and more, all through a convenient and efficient API wrapper. This library simplifies the process of integrating Unifi network management into custom solutions, making it ideal for network administrators, developers, and enthusiasts looking to automate and streamline their network operations.

simple-ai
Simple AI is a lightweight Python library for implementing basic artificial intelligence algorithms. It provides easy-to-use functions and classes for tasks such as machine learning, natural language processing, and computer vision. With Simple AI, users can quickly prototype and deploy AI solutions without the complexity of larger frameworks.

DAILA
DAILA is a unified interface for AI systems in decompilers, supporting various decompilers and AI systems. It allows users to utilize local and remote LLMs, like ChatGPT and Claude, and local models such as VarBERT. DAILA can be used as a decompiler plugin with GUI or as a scripting library. It also provides a Docker container for offline installations and supports tasks like summarizing functions and renaming variables in decompilation.
For similar tasks

LLMs4TS
LLMs4TS is a repository focused on the application of cutting-edge AI technologies for time-series analysis. It covers advanced topics such as self-supervised learning, Graph Neural Networks for Time Series, Large Language Models for Time Series, Diffusion models, Mixture-of-Experts architectures, and Mamba models. The resources in this repository span various domains like healthcare, finance, and traffic, offering tutorials, courses, and workshops from prestigious conferences. Whether you're a professional, data scientist, or researcher, the tools and techniques in this repository can enhance your time-series data analysis capabilities.

deeppowers
Deeppowers is a powerful Python library for deep learning applications. It provides a wide range of tools and utilities to simplify the process of building and training deep neural networks. With Deeppowers, users can easily create complex neural network architectures, perform efficient training and optimization, and deploy models for various tasks. The library is designed to be user-friendly and flexible, making it suitable for both beginners and experienced deep learning practitioners.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

openvino
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. It provides a common API to deliver inference solutions on various platforms, including CPU, GPU, NPU, and heterogeneous devices. OpenVINO™ supports pre-trained models from Open Model Zoo and popular frameworks like TensorFlow, PyTorch, and ONNX. Key components of OpenVINO™ include the OpenVINO™ Runtime, plugins for different hardware devices, frontends for reading models from native framework formats, and the OpenVINO Model Converter (OVC) for adjusting models for optimal execution on target devices.

djl-demo
The Deep Java Library (DJL) is a framework-agnostic Java API for deep learning. It provides a unified interface to popular deep learning frameworks such as TensorFlow, PyTorch, and MXNet. DJL makes it easy to develop deep learning applications in Java, and it can be used for a variety of tasks, including image classification, object detection, natural language processing, and speech recognition.

kaapana
Kaapana is an open-source toolkit for state-of-the-art platform provisioning in the field of medical data analysis. The applications comprise AI-based workflows and federated learning scenarios with a focus on radiological and radiotherapeutic imaging. Obtaining large amounts of medical data necessary for developing and training modern machine learning methods is an extremely challenging effort that often fails in a multi-center setting, e.g. due to technical, organizational and legal hurdles. A federated approach where the data remains under the authority of the individual institutions and is only processed on-site is, in contrast, a promising approach ideally suited to overcome these difficulties. Following this federated concept, the goal of Kaapana is to provide a framework and a set of tools for sharing data processing algorithms, for standardized workflow design and execution as well as for performing distributed method development. This will facilitate data analysis in a compliant way enabling researchers and clinicians to perform large-scale multi-center studies. By adhering to established standards and by adopting widely used open technologies for private cloud development and containerized data processing, Kaapana integrates seamlessly with the existing clinical IT infrastructure, such as the Picture Archiving and Communication System (PACS), and ensures modularity and easy extensibility.

MONAI
MONAI is a PyTorch-based, open-source framework for deep learning in healthcare imaging. It provides a comprehensive set of tools for medical image analysis, including data preprocessing, model training, and evaluation. MONAI is designed to be flexible and easy to use, making it a valuable resource for researchers and developers in the field of medical imaging.

nnstreamer
NNStreamer is a set of Gstreamer plugins that allow Gstreamer developers to adopt neural network models easily and efficiently and neural network developers to manage neural network pipelines and their filters easily and efficiently.
For similar jobs

promptflow
**Prompt flow** is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.

deepeval
DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.

MegaDetector
MegaDetector is an AI model that identifies animals, people, and vehicles in camera trap images (which also makes it useful for eliminating blank images). This model is trained on several million images from a variety of ecosystems. MegaDetector is just one of many tools that aims to make conservation biologists more efficient with AI. If you want to learn about other ways to use AI to accelerate camera trap workflows, check out our of the field, affectionately titled "Everything I know about machine learning and camera traps".

leapfrogai
LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.

llava-docker
This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.

carrot
The 'carrot' repository on GitHub provides a list of free and user-friendly ChatGPT mirror sites for easy access. The repository includes sponsored sites offering various GPT models and services. Users can find and share sites, report errors, and access stable and recommended sites for ChatGPT usage. The repository also includes a detailed list of ChatGPT sites, their features, and accessibility options, making it a valuable resource for ChatGPT users seeking free and unlimited GPT services.

TrustLLM
TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.

AI-YinMei
AI-YinMei is an AI virtual anchor Vtuber development tool (N card version). It supports fastgpt knowledge base chat dialogue, a complete set of solutions for LLM large language models: [fastgpt] + [one-api] + [Xinference], supports docking bilibili live broadcast barrage reply and entering live broadcast welcome speech, supports Microsoft edge-tts speech synthesis, supports Bert-VITS2 speech synthesis, supports GPT-SoVITS speech synthesis, supports expression control Vtuber Studio, supports painting stable-diffusion-webui output OBS live broadcast room, supports painting picture pornography public-NSFW-y-distinguish, supports search and image search service duckduckgo (requires magic Internet access), supports image search service Baidu image search (no magic Internet access), supports AI reply chat box [html plug-in], supports AI singing Auto-Convert-Music, supports playlist [html plug-in], supports dancing function, supports expression video playback, supports head touching action, supports gift smashing action, supports singing automatic start dancing function, chat and singing automatic cycle swing action, supports multi scene switching, background music switching, day and night automatic switching scene, supports open singing and painting, let AI automatically judge the content.