Best AI tools for< Start Inference Server >
20 - AI tool Sites
FluidStack
FluidStack is a leading GPU cloud platform designed for AI and LLM (Large Language Model) training. It offers unlimited scale for AI training and inference, allowing users to access thousands of fully-interconnected GPUs on demand. Trusted by top AI startups, FluidStack aggregates GPU capacity from data centers worldwide, providing access to over 50,000 GPUs for accelerating training and inference. With 1000+ data centers across 50+ countries, FluidStack ensures reliable and efficient GPU cloud services at competitive prices.
Cortex Labs
Cortex Labs is a decentralized world computer that enables AI and AI-powered decentralized applications (dApps) to run on the blockchain. It offers a Layer2 solution called ZkMatrix, which utilizes zkRollup technology to enhance transaction speed and reduce fees. Cortex Virtual Machine (CVM) supports on-chain AI inference using GPU, ensuring deterministic results across computing environments. Cortex also enables machine learning in smart contracts and dApps, fostering an open-source ecosystem for AI researchers and developers to share models. The platform aims to solve the challenge of on-chain machine learning execution efficiently and deterministically, providing tools and resources for developers to integrate AI into blockchain applications.
Favikon
Favikon is an AI-powered influencer marketing platform that helps businesses find and manage influencers for their marketing campaigns. It offers a range of features to help businesses with influencer discovery, tracking, and reporting. Favikon's AI-powered discovery feature enables businesses to find influencers across more than 600 specialized niches effortlessly. The platform also provides in-depth profiles of creators to understand their social media influence and popularity. Favikon's AI-powered rankings help businesses identify trending topics, popular formats, and emerging creators within their industry. The platform also offers a tracking feature to stay updated on the latest trends and posts across social media platforms.
Cision
Cision is an end-to-end communications and media intelligence platform that provides a suite of tools and services to help public relations and communications professionals understand, influence, and amplify their stories. Cision's platform includes PR Newswire, CisionOne, and Cision Insights, which offer a range of capabilities such as PR distribution, media monitoring, media analytics, and influencer outreach. Cision's solutions are used by a wide range of organizations, including Fortune 500 companies, government agencies, and non-profit organizations.
Start Left® Security
Start Left® Security is an AI-driven application security posture management platform that empowers product teams to automate secure-by-design software from people to cloud. The platform integrates security into every facet of the organization, offering a unified solution that aligns with business goals, fosters continuous improvement, and drives innovation. Start Left® Security provides a gamified DevSecOps experience with comprehensive security capabilities like SCA, SBOM, SAST, DAST, Container Security, IaC security, ASPM, and more.
GPT Jump Start
GPT Jump Start is a versatile solution designed to incorporate AI features into your projects. It functions in two ways: As an API, it can be integrated with custom projects or existing solutions like Zapier and Pabbly. As a WordPress plugin, it assists in generating high-quality, engaging, and relevant content based on provided prompts.
Dora
Dora is a no-code 3D animated website design platform that allows users to create stunning 3D and animated visuals without writing a single line of code. With Dora, designers, freelancers, and creative professionals can focus on what they do best: designing. The platform is tailored for professionals who prioritize design aesthetics without wanting to dive deep into the backend. Dora offers a variety of features, including a drag-and-connect constraint layout system, advanced animation capabilities, and pixel-perfect usability. With Dora, users can create responsive 3D and animated websites that translate seamlessly across devices.
Engage AI
Engage AI is a generative AI tool that helps businesses increase their LinkedIn engagement and lead generation. It offers a range of features to help users create personalized and engaging content, including the ability to generate comments, connection requests, and profile content. Engage AI also provides insights into LinkedIn trends and best practices, and offers a variety of resources to help users get the most out of the platform.
MyTales
MyTales is an AI-powered story generator that helps you create unique and engaging stories. With MyTales, you can start your adventure by submitting a prompt, and the AI will generate a story based on your input. You can then share your story with others or continue to develop it yourself.
Optimove
Optimove is a Customer-Led Marketing Platform that leverages a real-time Customer Data Platform (CDP) to orchestrate personalized multichannel campaigns optimized by AI. It enables businesses to deliver personalized experiences in real-time across various channels such as web, app, and marketing channels. With a focus on customer-led marketing, Optimove helps brands improve customer KPIs through data-driven campaigns and top-tier personalization. The platform offers a range of resources, including industry benchmarks, marketing guides, success stories, and best practices, to help users achieve marketing mastery.
WithSpark.ai
WithSpark.ai is a free AI-powered dating assistant that helps users start engaging conversations and create meaningful connections on dating apps. The tool uses advanced artificial intelligence to analyze user inputs and generate personalized conversation starters tailored to each user's interests and preferences. With Spark, users can easily spark genuine connections, boost their conversations, and engage effortlessly with personalized responses. The application empowers users to unleash their dating potential by providing witty replies, clever pick-up lines, captivating conversation starters, thoughtful messages, and more.
Infinilearn
Infinilearn is a personalized learning platform that revolutionizes education by offering gamified and interactive learning experiences. It features a customized AI Guide that grows with the user, providing personalized learning paths, gamified level system, earning grants directly through the app, and human-AI powered symbiosis. Infinilearn aims to make learning engaging, rewarding, and tailored to individual needs.
TopWorksheets
TopWorksheets is an online platform that allows teachers to create and share interactive worksheets and exercises. It offers a variety of features to help teachers save time and track student progress. Teachers can use TopWorksheets to convert existing worksheets into interactive ones, browse through thousands of worksheets created by other teachers, and even have AI assist them in creating new worksheets. Students can use TopWorksheets to complete assignments, receive auto-graded feedback, and track their own progress.
Getin.AI
Getin.AI is a platform that focuses on AI jobs, career paths, and company profiles in the fields of artificial intelligence, machine learning, and data science. Users can explore various job categories, such as Analyst, Consulting, Customer Service & Support, Data Science & Analytics, Engineering, Finance & Accounting, HR & Recruiting, Legal, Compliance and Ethics, Marketing & PR, Product, Sales And Business Development, Senior Management / C-level, Strategy & M&A, and UX, UI & Design. The platform provides a comprehensive list of remote job opportunities and features detailed job listings with information on job titles, companies, locations, job descriptions, and required skills.
AIBooster
AIBooster is a platform that helps AI businesses to market their products. It offers a variety of services, including directory submission, content marketing, and social media marketing. AIBooster's goal is to help AI start-ups reach their target audience and grow their business.
xZactly.ai
xZactly.ai is an AI tool that works with Artificial Intelligence start-up businesses and high growth companies to deliver accelerated revenue growth, AI specific sales, business development, and marketing expertise, seed and venture capital financing, business scale, and global expansion fund. The tool helps in connecting businesses and investors, providing go-to-market strategies, and offering triage, transformation, and turnaround solutions for AI and ML companies. With over 25 years of experience in sales, marketing, and business development, xZactly.ai aims to accelerate sales and boost revenue for AI-driven businesses by delivering expertise, strategy, and execution.
Mentionlytics
Mentionlytics is an AI-powered web and social media monitoring tool that helps businesses track and analyze online conversations about their brand, competitors, and industry. With Mentionlytics, businesses can gain insights into their audience's behavior, identify trends, and make informed decisions to improve their marketing and communication strategies.
Catfishes
Catfishes is an AI-powered tool that allows users to create realistic AI girls in seconds. With Catfishes, you can create custom AI influencers and earn up to $10,000 per month. Catfishes is easy to use and requires no prior experience with AI or image editing. Simply create a face for your AI girl with a simple prompt, and Catfishes will generate realistic images of your AI girl in any pose or environment. Catfishes is the perfect tool for anyone looking to create unique and engaging AI art.
ODDY
ODDY is a research copilot for UX design that conducts desk research and design reviews in seconds. It provides actionable insights, best practices, and direct references to help UX designers transform their challenges into better designs.
PaveAI
PaveAI is an automated analytics platform that helps businesses track and measure the effectiveness of their marketing campaigns. The platform provides detailed insights into which strategies are working and which are not, helping businesses to optimize their marketing spend and improve their ROI.
20 - Open Source AI Tools
podman-desktop-extension-ai-lab
Podman AI Lab is an open source extension for Podman Desktop designed to work with Large Language Models (LLMs) on a local environment. It features a recipe catalog with common AI use cases, a curated set of open source models, and a playground for learning, prototyping, and experimentation. Users can quickly and easily get started bringing AI into their applications without depending on external infrastructure, ensuring data privacy and security.
workbench-example-hybrid-rag
This NVIDIA AI Workbench project is designed for developing a Retrieval Augmented Generation application with a customizable Gradio Chat app. It allows users to embed documents into a locally running vector database and run inference locally on a Hugging Face TGI server, in the cloud using NVIDIA inference endpoints, or using microservices via NVIDIA Inference Microservices (NIMs). The project supports various models with different quantization options and provides tutorials for using different inference modes. Users can troubleshoot issues, customize the Gradio app, and access advanced tutorials for specific tasks.
LLMSpeculativeSampling
This repository implements speculative sampling for large language model (LLM) decoding, utilizing two models - a target model and an approximation model. The approximation model generates token guesses, corrected by the target model, resulting in improved efficiency. It includes implementations of Google's and Deepmind's versions of speculative sampling, supporting models like llama-7B and llama-1B. The tool is designed for fast inference from transformers via speculative decoding.
Qwen-TensorRT-LLM
Qwen-TensorRT-LLM is a project developed for the NVIDIA TensorRT Hackathon 2023, focusing on accelerating inference for the Qwen-7B-Chat model using TRT-LLM. The project offers various functionalities such as FP16/BF16 support, INT8 and INT4 quantization options, Tensor Parallel for multi-GPU parallelism, web demo setup with gradio, Triton API deployment for maximum throughput/concurrency, fastapi integration for openai requests, CLI interaction, and langchain support. It supports models like qwen2, qwen, and qwen-vl for both base and chat models. The project also provides tutorials on Bilibili and blogs for adapting Qwen models in NVIDIA TensorRT-LLM, along with hardware requirements and quick start guides for different model types and quantization methods.
tensorrtllm_backend
The TensorRT-LLM Backend is a Triton backend designed to serve TensorRT-LLM models with Triton Inference Server. It supports features like inflight batching, paged attention, and more. Users can access the backend through pre-built Docker containers or build it using scripts provided in the repository. The backend can be used to create models for tasks like tokenizing, inferencing, de-tokenizing, ensemble modeling, and more. Users can interact with the backend using provided client scripts and query the server for metrics related to request handling, memory usage, KV cache blocks, and more. Testing for the backend can be done following the instructions in the 'ci/README.md' file.
TensorRT-LLM
TensorRT-LLM is an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM contains components to create Python and C++ runtimes that execute those TensorRT engines. It also includes a backend for integration with the NVIDIA Triton Inference Server; a production-quality system to serve LLMs. Models built with TensorRT-LLM can be executed on a wide range of configurations going from a single GPU to multiple nodes with multiple GPUs (using Tensor Parallelism and/or Pipeline Parallelism).
fastc
Fastc is a tool focused on CPU execution, using efficient models for embedding generation and cosine similarity classification. It allows for efficient multi-classifier execution without extra overhead. Users can easily train text classifiers, export models, publish to HuggingFace, load existing models, make class predictions, use instruct templates, and launch an inference server. The tool provides an HTTP API for text classification with JSON payloads and supports multiple languages for language identification.
TinyLLM
TinyLLM is a project that helps build a small locally hosted language model with a web interface using consumer-grade hardware. It supports multiple language models, builds a local OpenAI API web service, and serves a Chatbot web interface with customizable prompts. The project requires specific hardware and software configurations for optimal performance. Users can run a local language model using inference servers like vLLM, llama-cpp-python, and Ollama. The Chatbot feature allows users to interact with the language model through a web-based interface, supporting features like summarizing websites, displaying news headlines, stock prices, weather conditions, and using vector databases for queries.
h2ogpt
h2oGPT is an Apache V2 open-source project that allows users to query and summarize documents or chat with local private GPT LLMs. It features a private offline database of any documents (PDFs, Excel, Word, Images, Video Frames, Youtube, Audio, Code, Text, MarkDown, etc.), a persistent database (Chroma, Weaviate, or in-memory FAISS) using accurate embeddings (instructor-large, all-MiniLM-L6-v2, etc.), and efficient use of context using instruct-tuned LLMs (no need for LangChain's few-shot approach). h2oGPT also offers parallel summarization and extraction, reaching an output of 80 tokens per second with the 13B LLaMa2 model, HYDE (Hypothetical Document Embeddings) for enhanced retrieval based upon LLM responses, a variety of models supported (LLaMa2, Mistral, Falcon, Vicuna, WizardLM. With AutoGPTQ, 4-bit/8-bit, LORA, etc.), GPU support from HF and LLaMa.cpp GGML models, and CPU support using HF, LLaMa.cpp, and GPT4ALL models. Additionally, h2oGPT provides Attention Sinks for arbitrarily long generation (LLaMa-2, Mistral, MPT, Pythia, Falcon, etc.), a UI or CLI with streaming of all models, the ability to upload and view documents through the UI (control multiple collaborative or personal collections), Vision Models LLaVa, Claude-3, Gemini-Pro-Vision, GPT-4-Vision, Image Generation Stable Diffusion (sdxl-turbo, sdxl) and PlaygroundAI (playv2), Voice STT using Whisper with streaming audio conversion, Voice TTS using MIT-Licensed Microsoft Speech T5 with multiple voices and Streaming audio conversion, Voice TTS using MPL2-Licensed TTS including Voice Cloning and Streaming audio conversion, AI Assistant Voice Control Mode for hands-free control of h2oGPT chat, Bake-off UI mode against many models at the same time, Easy Download of model artifacts and control over models like LLaMa.cpp through the UI, Authentication in the UI by user/password via Native or Google OAuth, State Preservation in the UI by user/password, Linux, Docker, macOS, and Windows support, Easy Windows Installer for Windows 10 64-bit (CPU/CUDA), Easy macOS Installer for macOS (CPU/M1/M2), Inference Servers support (oLLaMa, HF TGI server, vLLM, Gradio, ExLLaMa, Replicate, OpenAI, Azure OpenAI, Anthropic), OpenAI-compliant, Server Proxy API (h2oGPT acts as drop-in-replacement to OpenAI server), Python client API (to talk to Gradio server), JSON Mode with any model via code block extraction. Also supports MistralAI JSON mode, Claude-3 via function calling with strict Schema, OpenAI via JSON mode, and vLLM via guided_json with strict Schema, Web-Search integration with Chat and Document Q/A, Agents for Search, Document Q/A, Python Code, CSV frames (Experimental, best with OpenAI currently), Evaluate performance using reward models, and Quality maintained with over 1000 unit and integration tests taking over 4 GPU-hours.
llm-finetuning
llm-finetuning is a repository that provides a serverless twist to the popular axolotl fine-tuning library using Modal's serverless infrastructure. It allows users to quickly fine-tune any LLM model with state-of-the-art optimizations like Deepspeed ZeRO, LoRA adapters, Flash attention, and Gradient checkpointing. The repository simplifies the fine-tuning process by not exposing all CLI arguments, instead allowing users to specify options in a config file. It supports efficient training and scaling across multiple GPUs, making it suitable for production-ready fine-tuning jobs.
infinity
Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting all sentence-transformer models and frameworks. It is developed under the MIT License and powers inference behind Gradient.ai. The API allows users to deploy models from SentenceTransformers, offers fast inference backends utilizing various accelerators, dynamic batching for efficient processing, correct and tested implementation, and easy-to-use API built on FastAPI with Swagger documentation. Users can embed text, rerank documents, and perform text classification tasks using the tool. Infinity supports various models from Huggingface and provides flexibility in deployment via CLI, Docker, Python API, and cloud services like dstack. The tool is suitable for tasks like embedding, reranking, and text classification.
mosec
Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API. * **Highly performant** : web layer and task coordination built with Rust 🦀, which offers blazing speed in addition to efficient CPU utilization powered by async I/O * **Ease of use** : user interface purely in Python 🐍, by which users can serve their models in an ML framework-agnostic manner using the same code as they do for offline testing * **Dynamic batching** : aggregate requests from different users for batched inference and distribute results back * **Pipelined stages** : spawn multiple processes for pipelined stages to handle CPU/GPU/IO mixed workloads * **Cloud friendly** : designed to run in the cloud, with the model warmup, graceful shutdown, and Prometheus monitoring metrics, easily managed by Kubernetes or any container orchestration systems * **Do one thing well** : focus on the online serving part, users can pay attention to the model optimization and business logic
reverse-engineering-assistant
ReVA (Reverse Engineering Assistant) is a project aimed at building a disassembler agnostic AI assistant for reverse engineering tasks. It utilizes a tool-driven approach, providing small tools to the user to empower them in completing complex tasks. The assistant is designed to accept various inputs, guide the user in correcting mistakes, and provide additional context to encourage exploration. Users can ask questions, perform tasks like decompilation, class diagram generation, variable renaming, and more. ReVA supports different language models for online and local inference, with easy configuration options. The workflow involves opening the RE tool and program, then starting a chat session to interact with the assistant. Installation includes setting up the Python component, running the chat tool, and configuring the Ghidra extension for seamless integration. ReVA aims to enhance the reverse engineering process by breaking down actions into small parts, including the user's thoughts in the output, and providing support for monitoring and adjusting prompts.
functionary
Functionary is a language model that interprets and executes functions/plugins. It determines when to execute functions, whether in parallel or serially, and understands their outputs. Function definitions are given as JSON Schema Objects, similar to OpenAI GPT function calls. It offers documentation and examples on functionary.meetkai.com. The newest model, meetkai/functionary-medium-v3.1, is ranked 2nd in the Berkeley Function-Calling Leaderboard. Functionary supports models with different context lengths and capabilities for function calling and code interpretation. It also provides grammar sampling for accurate function and parameter names. Users can deploy Functionary models serverlessly using Modal.com.
holmesgpt
HolmesGPT is an open-source DevOps assistant powered by OpenAI or any tool-calling LLM of your choice. It helps in troubleshooting Kubernetes, incident response, ticket management, automated investigation, and runbook automation in plain English. The tool connects to existing observability data, is compliance-friendly, provides transparent results, supports extensible data sources, runbook automation, and integrates with existing workflows. Users can install HolmesGPT using Brew, prebuilt Docker container, Python Poetry, or Docker. The tool requires an API key for functioning and supports OpenAI, Azure AI, and self-hosted LLMs.
chat-ui
A chat interface using open source models, eg OpenAssistant or Llama. It is a SvelteKit app and it powers the HuggingChat app on hf.co/chat.
llm-vscode
llm-vscode is an extension designed for all things LLM, utilizing llm-ls as its backend. It offers features such as code completion with 'ghost-text' suggestions, the ability to choose models for code generation via HTTP requests, ensuring prompt size fits within the context window, and code attribution checks. Users can configure the backend, suggestion behavior, keybindings, llm-ls settings, and tokenization options. Additionally, the extension supports testing models like Code Llama 13B, Phind/Phind-CodeLlama-34B-v2, and WizardLM/WizardCoder-Python-34B-V1.0. Development involves cloning llm-ls, building it, and setting up the llm-vscode extension for use.
llmware
LLMWare is a framework for quickly developing LLM-based applications including Retrieval Augmented Generation (RAG) and Multi-Step Orchestration of Agent Workflows. This project provides a comprehensive set of tools that anyone can use - from a beginner to the most sophisticated AI developer - to rapidly build industrial-grade, knowledge-based enterprise LLM applications. Our specific focus is on making it easy to integrate open source small specialized models and connecting enterprise knowledge safely and securely.
AIlice
AIlice is a fully autonomous, general-purpose AI agent that aims to create a standalone artificial intelligence assistant, similar to JARVIS, based on the open-source LLM. AIlice achieves this goal by building a "text computer" that uses a Large Language Model (LLM) as its core processor. Currently, AIlice demonstrates proficiency in a range of tasks, including thematic research, coding, system management, literature reviews, and complex hybrid tasks that go beyond these basic capabilities. AIlice has reached near-perfect performance in everyday tasks using GPT-4 and is making strides towards practical application with the latest open-source models. We will ultimately achieve self-evolution of AI agents. That is, AI agents will autonomously build their own feature expansions and new types of agents, unleashing LLM's knowledge and reasoning capabilities into the real world seamlessly.
ultravox
Ultravox is a fast multimodal Language Model (LLM) that can understand both text and human speech in real-time without the need for a separate Audio Speech Recognition (ASR) stage. By extending Meta's Llama 3 model with a multimodal projector, Ultravox converts audio directly into a high-dimensional space used by Llama 3, enabling quick responses and potential understanding of paralinguistic cues like timing and emotion in human speech. The current version (v0.3) has impressive speed metrics and aims for further enhancements. Ultravox currently converts audio to streaming text and plans to emit speech tokens for direct audio conversion. The tool is open for collaboration to enhance this functionality.
20 - OpenAI Gpts
Business Reporter for the Start-Up Ecosystem
I can research and review news that will interest the key players in the Start-Up ecosystem and provide them with briefing.
PSYCH: Your Compass to Inner Clarity (TPW.AI)
Start by sharing what’s on your mind or any emotional challenges you're facing. PSYCH will guide you through reflective dialogue, providing insights and coping mechanisms tailored to your needs.
Quotes Wallpaper Creator
I could provide you a quote wallpaper every day to start your day right.
Neighbot
Start by giving Neighbot the name of a neighborhood and state (eg Orchard Hills, CA). It will provide descriptions for your market collateral. You can follow up and ask about local restaurants and builder communities.
Plot Breaker
Start with a genre and I'll help you develop a rough story outline. You can handle the rest
Supervisors of Gambling Workers Ready
It’s your first day! Excited, Nervous? Let me help you start off strong in your career. Type "help" for More Information
Tax Preparers Ready
It’s your first day! Excited, Nervous? Let me help you start off strong in your career. Type "help" for More Information
Medical Secretaries and Assistants Ready
It’s your first day! Excited, Nervous? Let me help you start off strong in your career. Type "help" for More Information
Couriers and Messengers Ready
It’s your first day! Excited, Nervous? Let me help you start off strong in your career. Type "help" for More Information
Image Theme Clone
Type “Start” and Get Exact Details on Image Generation and/or Duplication