proton

⚡ Fastest SQL ETL pipeline in a single C++ binary, built for stream processing, observability, analytics and AI/ML

Stars: 2133

Visit

Proton is the fastest SQL pipeline engine in a single C++ binary, designed for stream processing, analytics, observability, and AI. It provides a simple, fast, and efficient alternative to ksqlDB and Apache Flink, powered by ClickHouse engine. Proton offers native source/sink support for various databases, streaming ingestion, multi-stream JOINs, incremental materialized views, alerting, tasks, and UDF in Python/JS. It is lightweight, with no JVM or dependencies, and offers high performance through SIMD optimization. Proton is ideal for real-time analytics ETL/pipeline, telemetry pipeline and alerting, real-time feature pipeline for AI/ML, and more.

README:

Fastest SQL pipeline engine for stream processing, analytics, observability and AI

What's Timeplus Proton

🚀 The fastest SQL pipeline engine in a single C++ binary, for stream processing, analytics, observability and AI. A simple, fast and efficient alternative to ksqlDB and Apache Flink, powered by ClickHouse engine.

🔥 SQL for everything : Native source/sink (Kafka, ClickHouse, MySQL, Postgres, MongoDB, S3/Iceberg, OpenSearch etc.), Streaming ingestion, Multi-stream JOINs, Incremental Materialized Views, Alerting, Tasks, UDF in Python/JS etc.

⚡ No JVM. No ZooKeeper. Zero dependencies. Just speed, control and scale.

Get started in seconds

curl https://install.timeplus.com/oss | sh

Why Timeplus Proton

Apache Flink or ksqlDB alternative. Timeplus Proton provides powerful stream processing functionalities, such as streaming ETL, tumble/hop/session windows, watermarks, incremental materialized views maintenance, CDC and data revision processing. In contrast to pure stream processors, it also stores queryable analytical/row based materialized views within Proton itself for use in analytics dashboards and applications.
Fast. Timeplus Proton is written in C++, with optimized performance through SIMD. For example, on an Apple MacBookPro with M2 Max, Timeplus Proton can deliver 90 million EPS, 4 millisecond end-to-end latency, and high cardinality aggregation with 1 million unique keys.
Lightweight. Timeplus Proton is a single binary (<500MB). No JVM or any other dependencies. You can also run it with Docker, or on an AWS t2.nano instance (1 vCPU and 0.5 GiB memory).
Powered by the fast, resource efficient ClickHouse. Timeplus Proton extends the historical data, storage, and computing functionality of ClickHouse with stream processing. Thousands of SQL functions are available in Timeplus Proton. Billions of rows are queried in milliseconds.
Best streaming SQL engine for Kafka or Redpanda. Query the live data in Kafka or other compatible streaming data platforms, with external streams.

See our architecture doc for technical details and our FAQ for more information.

Use Cases

Timeplus Proton empowers you to build a wide range of real-time applications and data pipelines. Common use cases include:

Real-time Analytics ETL/Pipeline: Efficiently ingest live data from sources like Kafka, perform in-pipeline transformations (filtering, enrichment, masking), and route it to downstream systems, including data warehouses like ClickHouse, other Kafka topics, or analytical stores.
Real-time Telemetry Pipeline and Alerting: Process and route logs, metrics, and traces with in-pipeline noise reduction, real-time alerts before forwarding to Splunk, Elastic, or S3.
Real-time Feature Pipeline for AI/ML: Compute real-time features using low-latency, high-throughput streaming SQL and materialized views with support for backfill and advanced windowing over live data.

Demo

2-minute short video👇. Check out the full video at YouTube.

https://github.com/timeplus-io/proton/assets/5076438/8ceca355-d992-4798-b861-1e0334fc4438

Deployment

A single binary:

curl https://install.timeplus.com/oss | sh

Once the proton binary is available, you can run proton server to start the server and put the config/logs/data in the current folder proton-data. Then use proton client in the other terminal to start the SQL client.

For Mac users, you can also use Homebrew to manage the install/upgrade/uninstall:

brew install timeplus-io/timeplus/proton

Docker:

docker run -d --pull always -p 8123:8123 -p 8463:8463 --name proton d.timeplus.com/timeplus-io/proton:latest

Please check Server Ports to determine which ports to expose, so that other tools can connect to Timeplus, such as DBeaver.

Docker Compose:

The Docker Compose stack demonstrates how to read/write data in Kafka/Redpanda with external streams.

Demo:

Don't want to setup by yourself? Try Timeplus Demo (https://demos.timeplus.com/)

Usage

SQL is the main interface. You can start a new terminal window with proton client to start the SQL shell.

[!NOTE] You can also integrate Timeplus Proton with Python/Java/Go SDK, REST API, or BI plugins. Please check Integrations

In the proton client, you can write SQL to create External Stream for Kafka or External Table for ClickHouse.

For example, you can read from AWS MSK and write the data to ClickHouse for the following SQL:

-- Read from AWS MSK using IAM Role
CREATE EXTERNAL STREAM aws_msk_stream (
  device string,
  temperature float
)
SETTINGS
    type='kafka',
    brokers='prefix.kafka.us-west-2.amazonaws.com:9098',
    topic='topic',
    security_protocol='SASL_SSL',
    sasl_mechanism='AWS_MSK_IAM';

-- Write to ClickHouse
CREATE EXTERNAL TABLE ch_aiven
SETTINGS type='clickhouse',
            address='abc.aivencloud.com:28851',
            user='avnadmin',
            password='..',
            secure=true,
            table='events';

-- Setup a long-running materialized view to write aggregated data to ClickHouse
CREATE MATERIALIZED VIEW mv_msk2ch INTO ch_aiven AS
SELECT window_start as timestamp, device, avg(temperature) as avg_temperature
FROM tumble(aws_msk_stream, 10s) GROUP BY window_start, device;

If you don't have immediate access to Kafka or ClickHouse, you can also run the following SQL to generate random data:

-- Create a stream with random data
CREATE RANDOM STREAM devices(
  device string default 'device'||to_string(rand()%4),
  temperature float default rand()%1000/10);

-- Run the streaming SQL
SELECT device, count(*), min(temperature), max(temperature)
FROM devices GROUP BY device;

You should see data like the following:

┌─device──┬─count()─┬─min(temperature)─┬─max(temperature)─┐
│ device0 │    2256 │                0 │             99.6 │
│ device1 │    2260 │              0.1 │             99.7 │
│ device3 │    2259 │              0.3 │             99.9 │
│ device2 │    2225 │              0.2 │             99.8 │
└─────────┴─────────┴──────────────────┴──────────────────┘

What's next

To see more examples of using Timeplus Proton, check out the examples folder.

To access more features, such as sources, sinks, dashboards, alerts, and data lineage, try Timeplus Enterprise locally.

What features are available with Timeplus Proton versus Timeplus Enterprise?

	Timeplus Proton	Timeplus Enterprise
Deployment	Single-node Docker image Single binary on Mac/Linux	Single node, or Cluster Kubernetes-based self-hosting
Data sources	Random streams External streams to Apache Kafka, Apache Pulsar, Confluent Cloud, Redpanda External streams to another Timeplus Proton or Timeplus Enterprise deployment External tables to ClickHouse Streaming ingestion via REST API (compact mode only)	Everything in Timeplus Proton WebSocket and HTTP Stream NATS CSV upload Streaming ingestion via REST API (with API key and flexible modes) Hundreds of connectors from Redpanda Connect
Data destinations (sinks)	External streams to Apache Kafka, Apache Pulsar, Confluent Cloud, Redpanda External streams to another Timeplus Proton or Timeplus Enterprise deployment External tables to ClickHouse	Everything in Timeplus Proton Slack Webhook Hundreds of connectors from Redpanda Connect
Support	Community support from GitHub and Slack	Enterprise support via email, Slack, and Zoom, with a SLA

Integrations

The following drivers are available:

https://github.com/timeplus-io/proton-java-driver JDBC and other Java clients
https://github.com/timeplus-io/proton-go-driver
https://github.com/timeplus-io/proton-python-driver

Integrations with other systems:

ClickHouse https://docs.timeplus.com/proton-clickhouse-external-table
Docker and Testcontainers https://docs.timeplus.com/tutorial-testcontainers-java
Sling https://docs.timeplus.com/sling
Grafana https://github.com/timeplus-io/proton-grafana-source
Homebrew https://github.com/timeplus-io/homebrew-timeplus
dbt https://github.com/timeplus-io/dbt-proton

Documentation

We publish full documentation for Timeplus Proton at docs.timeplus.com alongside documentation for Timeplus Enterprise.

We also have a FAQ for detailing how we chose Apache License 2.0, how Timeplus Proton is related to ClickHouse, and more.

Contributing

We welcome your contributions! If you are looking for issues to work on, try looking at the issue list.

Please see the wiki for more details, and BUILD.md to compile Timeplus Proton in different platforms.

Adding a Company Logo

If you are using Timeplus Proton and would like your company logo displayed on our Home page, please email [email protected] with your request.

Need help?

Please use GitHub Discussions to share your feedbacks or questions for Timeplus Proton.

For filing bugs, suggesting improvements, or requesting new features, open GitHub Issues.

To connect with Timeplus engineers or inquire about Timeplus Enterprise, join our Timeplus Community Slack.

Licensing

Proton uses Apache License 2.0. See details in the LICENSE.

For Tasks:

Click tags to check more tools for each tasks

ingest live data process telemetry compute real-time features route data create materialized views

For Jobs:

data engineer data analyst stream processing engineer ai engineer observability engineer

Alternative AI tools for proton

Similar Open Source Tools

proton

github

: 2.1k

orbit

ORBIT (Open Retrieval-Based Inference Toolkit) is a middleware platform that provides a unified API for AI inference. It acts as a central gateway, allowing you to connect various local and remote AI models with your private data sources like SQL databases, vector stores, and local files. ORBIT uses a flexible adapter architecture to connect your data to AI models, creating specialized 'agents' for specific tasks. It supports scenarios like Knowledge Base Q&A and Chat with Your SQL Database, enabling users to interact with AI models seamlessly. The tool offers a RESTful API for programmatic access and includes features like authentication, API key management, system prompts, health monitoring, and file management. ORBIT is designed to streamline AI inference tasks and facilitate interactions between users and AI models.

github

: 227

clearml

ClearML is an auto-magical suite of tools designed to streamline AI workflows. It includes modules for experiment management, MLOps/LLMOps, data management, model serving, and more. ClearML offers features like experiment tracking, model serving, orchestration, and automation. It supports various ML/DL frameworks and integrates with Jupyter Notebook and PyCharm for remote debugging. ClearML aims to simplify collaboration, automate processes, and enhance visibility in AI projects.

github

: 5.9k

neptune-client

Neptune is a scalable experiment tracker for teams training foundation models. Log millions of runs, effortlessly monitor and visualize model training, and deploy on your infrastructure. Track 100% of metadata to accelerate AI breakthroughs. Log and display any framework and metadata type from any ML pipeline. Organize experiments with nested structures and custom dashboards. Compare results, visualize training, and optimize models quicker. Version models, review stages, and access production-ready models. Share results, manage users, and projects. Integrate with 25+ frameworks. Trusted by great companies to improve workflow.

github

: 574

db2rest

DB2Rest is a modern low code REST DATA API platform that enables the rapid development of intelligent applications by combining databases, language models, and vector stores. It facilitates context-aware, reasoning applications without vendor lock-in. The tool accelerates application delivery, fosters faster innovation with AI, serves as a secure database gateway, and simplifies integration. It supports various databases like PostgreSQL, MySQL, MS SQL Server, Oracle, MongoDB, and more, with planned support for additional databases. Users can connect on Discord for support and contact [email protected] for inquiries.

github

: 320

chatnio

Chat Nio is a next-generation AIGC one-stop business solution that combines the advantages of frontend-oriented lightweight deployment projects with powerful API distribution systems. It offers rich model support, beautiful UI design, complete Markdown support, multi-theme support, internationalization support, text-to-image support, powerful conversation sync, model market & preset system, rich file parsing, full model internet search, Progressive Web App (PWA) support, comprehensive backend management, multiple billing methods, innovative model caching, and additional features. The project aims to address limitations in conversation synchronization, billing, file parsing, conversation URL sharing, channel management, and API call support found in existing AIGC commercial sites, while also providing a user-friendly interface design and C-end features.

github

: 3.1k

Linly-Talker

Linly-Talker is an innovative digital human conversation system that integrates the latest artificial intelligence technologies, including Large Language Models (LLM) 🤖, Automatic Speech Recognition (ASR) 🎙️, Text-to-Speech (TTS) 🗣️, and voice cloning technology 🎤. This system offers an interactive web interface through the Gradio platform 🌐, allowing users to upload images 📷 and engage in personalized dialogues with AI 💬.

github

: 2.2k

Ivy-Framework

Ivy-Framework is a powerful tool for building internal applications with AI assistance using C# codebase. It provides a CLI for project initialization, authentication integrations, database support, LLM code generation, secrets management, container deployment, hot reload, dependency injection, state management, routing, and external widget framework. Users can easily create data tables for sorting, filtering, and pagination. The framework offers a seamless integration of front-end and back-end development, making it ideal for developing robust internal tools and dashboards.

github

: 289

Mooncake

Mooncake is a serving platform for Kimi, a leading LLM service provided by Moonshot AI. It features a KVCache-centric disaggregated architecture that separates prefill and decoding clusters, leveraging underutilized CPU, DRAM, and SSD resources of the GPU cluster. Mooncake's scheduler balances throughput and latency-related SLOs, with a prediction-based early rejection policy for highly overloaded scenarios. It excels in long-context scenarios, achieving up to a 525% increase in throughput while handling 75% more requests under real workloads.

github

: 4.7k

inngest

Inngest is a platform that offers durable functions to replace queues, state management, and scheduling for developers. It allows writing reliable step functions faster without dealing with infrastructure. Developers can create durable functions using various language SDKs, run a local development server, deploy functions to their infrastructure, sync functions with the Inngest Platform, and securely trigger functions via HTTPS. Inngest Functions support retrying, scheduling, and coordinating operations through triggers, flow control, and steps, enabling developers to build reliable workflows with robust support for various operations.

github

: 4.8k

koog

Koog is a Kotlin-based framework for building and running AI agents entirely in idiomatic Kotlin. It allows users to create agents that interact with tools, handle complex workflows, and communicate with users. Key features include pure Kotlin implementation, MCP integration, embedding capabilities, custom tool creation, ready-to-use components, intelligent history compression, powerful streaming API, persistent agent memory, comprehensive tracing, flexible graph workflows, modular feature system, scalable architecture, and multiplatform support.

github

: 3.7k

LlamaBot

LlamaBot is an open-source AI coding agent that rapidly builds MVPs, prototypes, and internal tools. It works for non-technical founders, product teams, and engineers by generating working prototypes, embedding AI directly into the app, and running real workflows. Unlike typical codegen tools, LlamaBot can embed directly in your app and run real workflows, making it ideal for collaborative software building where founders guide the vision, engineers stay in control, and AI fills the gap. LlamaBot is built for moving ideas fast, allowing users to prototype an AI MVP in a weekend, experiment with workflows, and collaborate with teammates to bridge the gap between non-technical founders and engineering teams.

github

: 210

h2ogpt

h2oGPT is an Apache V2 open-source project that allows users to query and summarize documents or chat with local private GPT LLMs. It features a private offline database of any documents (PDFs, Excel, Word, Images, Video Frames, Youtube, Audio, Code, Text, MarkDown, etc.), a persistent database (Chroma, Weaviate, or in-memory FAISS) using accurate embeddings (instructor-large, all-MiniLM-L6-v2, etc.), and efficient use of context using instruct-tuned LLMs (no need for LangChain's few-shot approach). h2oGPT also offers parallel summarization and extraction, reaching an output of 80 tokens per second with the 13B LLaMa2 model, HYDE (Hypothetical Document Embeddings) for enhanced retrieval based upon LLM responses, a variety of models supported (LLaMa2, Mistral, Falcon, Vicuna, WizardLM. With AutoGPTQ, 4-bit/8-bit, LORA, etc.), GPU support from HF and LLaMa.cpp GGML models, and CPU support using HF, LLaMa.cpp, and GPT4ALL models. Additionally, h2oGPT provides Attention Sinks for arbitrarily long generation (LLaMa-2, Mistral, MPT, Pythia, Falcon, etc.), a UI or CLI with streaming of all models, the ability to upload and view documents through the UI (control multiple collaborative or personal collections), Vision Models LLaVa, Claude-3, Gemini-Pro-Vision, GPT-4-Vision, Image Generation Stable Diffusion (sdxl-turbo, sdxl) and PlaygroundAI (playv2), Voice STT using Whisper with streaming audio conversion, Voice TTS using MIT-Licensed Microsoft Speech T5 with multiple voices and Streaming audio conversion, Voice TTS using MPL2-Licensed TTS including Voice Cloning and Streaming audio conversion, AI Assistant Voice Control Mode for hands-free control of h2oGPT chat, Bake-off UI mode against many models at the same time, Easy Download of model artifacts and control over models like LLaMa.cpp through the UI, Authentication in the UI by user/password via Native or Google OAuth, State Preservation in the UI by user/password, Linux, Docker, macOS, and Windows support, Easy Windows Installer for Windows 10 64-bit (CPU/CUDA), Easy macOS Installer for macOS (CPU/M1/M2), Inference Servers support (oLLaMa, HF TGI server, vLLM, Gradio, ExLLaMa, Replicate, OpenAI, Azure OpenAI, Anthropic), OpenAI-compliant, Server Proxy API (h2oGPT acts as drop-in-replacement to OpenAI server), Python client API (to talk to Gradio server), JSON Mode with any model via code block extraction. Also supports MistralAI JSON mode, Claude-3 via function calling with strict Schema, OpenAI via JSON mode, and vLLM via guided_json with strict Schema, Web-Search integration with Chat and Document Q/A, Agents for Search, Document Q/A, Python Code, CSV frames (Experimental, best with OpenAI currently), Evaluate performance using reward models, and Quality maintained with over 1000 unit and integration tests taking over 4 GPU-hours.

github

: 11.7k

ai-flow

AI Flow is an open-source, user-friendly UI application that empowers you to seamlessly connect multiple AI models together, specifically leveraging the capabilities of multiples AI APIs such as OpenAI, StabilityAI and Replicate. In a nutshell, AI Flow provides a visual platform for crafting and managing AI-driven workflows, thereby facilitating diverse and dynamic AI interactions.

github

: 188

BIRD-CRITIC-1

BIRD-CRITIC 1.0 is a SQL benchmark designed to evaluate the capability of large language models (LLMs) in diagnosing and solving user issues within real-world database environments. It comprises 600 tasks for development and 200 held-out out-of-distribution tests across 4 prominent open-source SQL dialects. The benchmark expands beyond simple SELECT queries to cover a wider range of SQL operations, reflecting actual application scenarios. An optimized execution-based evaluation environment is included for rigorous and efficient validation.

github

: 769

gptme

Personal AI assistant/agent in your terminal, with tools for using the terminal, running code, editing files, browsing the web, using vision, and more. A great coding agent that is general-purpose to assist in all kinds of knowledge work, from a simple but powerful CLI. An unconstrained local alternative to ChatGPT with 'Code Interpreter', Cursor Agent, etc. Not limited by lack of software, internet access, timeouts, or privacy concerns if using local models.

github

: 4.2k

For similar tasks

proton

github

: 2.1k

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 697

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k