
duckdb-airport-extension
The Airport extension for DuckDB, enables the use of Arrow Flight with DuckDB
Stars: 170

The 'duckdb-airport-extension' is a tool that enables the use of Arrow Flight with DuckDB. It provides functions to list available Arrow Flights at a specific endpoint and to retrieve the contents of an Arrow Flight. The extension also supports creating secrets for authentication purposes. It includes features for serializing filters and optimizing projections to enhance data transmission efficiency. The tool is built on top of gRPC and the Arrow IPC format, offering high-performance data services for data processing and retrieval.
README:
This extension "airport
" enables the use of Arrow Flight with DuckDB.
Arrow Flight is an RPC framework for high-performance data services based on Apache Arrow and is built on top of gRPC and the Arrow IPC format.
airport_list_flights(location, criteria, auth_token="token_value", secret="secret_name")
Description: This function returns a list of Arrow Flights that are available at a particular endpoint.
Parameter Name | Type | Description |
---|---|---|
location |
VARCHAR |
This is the location of the Flight server |
criteria |
VARCHAR |
This is free-form criteria to pass to the Flight server |
Parameter Name | Type | Description |
---|---|---|
auth_token |
VARCHAR |
A bearer value token to present to the server, the header is formatted like Authorization: Bearer <auth_token>
|
secret |
VARCHAR |
This is the name of the DuckDB secret to use to supply the value for the auth_token
|
> select * from airport_list_flights('http://127.0.0.1:8815', null);
flight_descriptor = [uploaded.parquet]
endpoint = [{'ticket': uploaded.parquet, 'location': [grpc://0.0.0.0:8815], 'expiration_time': NULL, 'app_metadata': }]
ordered = false
total_records = 3
total_bytes = 363
app_metadata =
schema = Character: string
In addition to the criteria
parameter, the Airport extension will pass additional GRPC headers.
The airport-duckdb-json-filters
header is sent on the GRPC requests. The header contains a JSON serialized representation of all of the conditional filters that are going applied to the results.
To illustrate this through an example:
select * from airport_list_flights('grpc://localhost:8815/', null) where total_bytes = 5;
The GRPC header airport-duckdb-json-filters
will be set to
{
"filters": [
{
"expression_class": "BOUND_COMPARISON",
"type": "COMPARE_EQUAL",
"left": {
"expression_class": "BOUND_COLUMN_REF",
"type": "BOUND_COLUMN_REF",
"alias": "total_bytes",
"return_type": {
"id": "BIGINT"
}
},
"right": {
"expression_class": "BOUND_CONSTANT",
"type": "VALUE_CONSTANT",
"value": {
"type": {
"id": "BIGINT"
},
"is_null": false,
"value": 5
}
}
}
]
}
The airport-duckdb-json-filters
header will not contain newlines, but the JSON has been reformatted in this document for ease of comprehension.
It is up to the implementer of the server to use this header to apply optimizations. The Airport DuckDB extension will still apply the filters to the result returned by the server. This means that the filter logic is purely advisory. In the author's experience, if Arrow Flight servers implement the filtering logic server side it can unlock some impressive optimizations. The JSON schema of the serialized filter expressions is not guaranteed to remain unchanged across DuckDB versions, the serialization is performed by the DuckDB code.
The header airport-duckdb-column-ids
will contain a comma-separated list of column indexes that are used in the query. The Arrow Flight server can return nulls for columns that are not requested. This can be used to reduce the amount of data that is transmitted in the response.
airport_take_flight(location, descriptor, auth_token="token_value", secret="secret_name")
Description: This function is a table returning function, it returns the contents of the Arrow Flight.
Parameter Name | Type | Description |
---|---|---|
location |
VARCHAR |
This is the location of the Flight server |
descriptor |
ANY |
This is the descriptor of the flight. If it is a VARCHAR or BLOB it is interpreted as a command, if it is an ARRAY or VARCHAR[] it is considered a path-based descriptor. |
Parameter Name | Type | Description |
---|---|---|
auth_token |
VARCHAR |
A bearer value token to present to the server, the header is formatted like Authorization: Bearer <auth_token>
|
secret |
VARCHAR |
This is the name of the DuckDB secret to use to supply the value for the auth_token
|
ticket |
BLOB |
This is the ticket (an opaque binary token) supplied to the Flight server it overrides any ticket supplied from GetFlightInfo. |
headers |
MAP(VARCHAR, VARCHAR) |
A map of extra GRPC headers to send with requests to the Flight server. |
select * from airport_take_flight('grpc://localhost:8815/', ['counter-stream']) limit 5;
┌─────────┐
│ counter │
│ int64 │
├─────────┤
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
└─────────┘
To create a secret that can be used by airport_take_flight
and airport_list_flight
use the standard DuckDB CREATE SECRET
command.
CREATE SECRET airport_hello_world (
type airport,
auth_token 'test-token',
scope 'grpc+tls://server.example.com/'
);
The Airport extension respects the scope(s) specified in the secret. If a value for auth_token
isn't supplied, but a secret exists with a scope that matches the server location the value for the auth_token
will be used from the secret.
- Investigate the multithreaded endpoint support.
# Clone this repo with submodules.
# duckdb and extension-ci-tools are submodules.
git clone --recursive [email protected]:Query-farm/duckdb-airport-extension
# Clone the vcpkg repo
git clone https://github.com/Microsoft/vcpkg.git
# Bootstrap vcpkg
./vcpkg/bootstrap-vcpkg.sh
export VCPKG_TOOLCHAIN_PATH=`pwd`/vcpkg/scripts/buildsystems/vcpkg.cmake
# Build the extension
make
# If you have ninja installed, you can use it to speed up the build
# GEN=ninja make
The main binaries that will be built are:
./build/release/duckdb
./build/release/test/unittest
./build/release/extension/airport/airport.duckdb_extension
-
duckdb
is the binary for the duckdb shell with the extension code automatically loaded. -
unittest
is the test runner of duckdb. Again, the extension is already linked into the binary. -
airport.duckdb_extension
is the loadable binary as it would be distributed.
To run the extension code, simply start the shell with ./build/release/duckdb
. This duckdb shell will have the extension pre-loaded.
Now we can use the features from the extension directly in DuckDB.
Different tests can be created for DuckDB extensions. The primary way of testing DuckDB extensions should be the SQL tests in ./test/sql
. These SQL tests can be run using:
make test
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for duckdb-airport-extension
Similar Open Source Tools

duckdb-airport-extension
The 'duckdb-airport-extension' is a tool that enables the use of Arrow Flight with DuckDB. It provides functions to list available Arrow Flights at a specific endpoint and to retrieve the contents of an Arrow Flight. The extension also supports creating secrets for authentication purposes. It includes features for serializing filters and optimizing projections to enhance data transmission efficiency. The tool is built on top of gRPC and the Arrow IPC format, offering high-performance data services for data processing and retrieval.

magentic
Easily integrate Large Language Models into your Python code. Simply use the `@prompt` and `@chatprompt` decorators to create functions that return structured output from the LLM. Mix LLM queries and function calling with regular Python code to create complex logic.

mergekit
Mergekit is a toolkit for merging pre-trained language models. It uses an out-of-core approach to perform unreasonably elaborate merges in resource-constrained situations. Merges can be run entirely on CPU or accelerated with as little as 8 GB of VRAM. Many merging algorithms are supported, with more coming as they catch my attention.

LEADS
LEADS is a lightweight embedded assisted driving system designed to simplify the development of instrumentation, control, and analysis systems for racing cars. It is written in Python and C/C++ with impressive performance. The system is customizable and provides abstract layers for component rearrangement. It supports hardware components like Raspberry Pi and Arduino, and can adapt to various hardware types. LEADS offers a modular structure with a focus on flexibility and lightweight design. It includes robust safety features, modern GUI design with dark mode support, high performance on different platforms, and powerful ESC systems for traction control and braking. The system also supports real-time data sharing, live video streaming, and AI-enhanced data analysis for driver training. LEADS VeC Remote Analyst enables transparency between the driver and pit crew, allowing real-time data sharing and analysis. The system is designed to be user-friendly, adaptable, and efficient for racing car development.

Construction-Hazard-Detection
Construction-Hazard-Detection is an AI-driven tool focused on improving safety at construction sites by utilizing the YOLOv8 model for object detection. The system identifies potential hazards like overhead heavy loads and steel pipes, providing real-time analysis and warnings. Users can configure the system via a YAML file and run it using Docker. The primary dataset used for training is the Construction Site Safety Image Dataset enriched with additional annotations. The system logs are accessible within the Docker container for debugging, and notifications are sent through the LINE messaging API when hazards are detected.

ai-dial-core
AI DIAL Core is an HTTP Proxy that provides a unified API to different chat completion and embedding models, assistants, and applications. It is written in Java 17 and built on Eclipse Vert.x. The core functionality includes handling static and dynamic settings, deployment on Kubernetes using Helm charts, and storing user data in Blob Storage and Redis. It supports various identity providers, storage providers like AWS S3, Google Cloud Storage, and Azure Blob Store, and features like AI DIAL Addons, Interceptors, Assistants, Applications, and Models with customizable parameters and configurations.

monacopilot
Monacopilot is a powerful and customizable AI auto-completion plugin for the Monaco Editor. It supports multiple AI providers such as Anthropic, OpenAI, Groq, and Google, providing real-time code completions with an efficient caching system. The plugin offers context-aware suggestions, customizable completion behavior, and framework agnostic features. Users can also customize the model support and trigger completions manually. Monacopilot is designed to enhance coding productivity by providing accurate and contextually appropriate completions in daily spoken language.

syncode
SynCode is a novel framework for the grammar-guided generation of Large Language Models (LLMs) that ensures syntactically valid output with respect to defined Context-Free Grammar (CFG) rules. It supports general-purpose programming languages like Python, Go, SQL, JSON, and more, allowing users to define custom grammars using EBNF syntax. The tool compares favorably to other constrained decoders and offers features like fast grammar-guided generation, compatibility with HuggingFace Language Models, and the ability to work with various decoding strategies.

datadreamer
DataDreamer is an advanced toolkit designed to facilitate the development of edge AI models by enabling synthetic data generation, knowledge extraction from pre-trained models, and creation of efficient and potent models. It eliminates the need for extensive datasets by generating synthetic datasets, leverages latent knowledge from pre-trained models, and focuses on creating compact models suitable for integration into any device and performance for specialized tasks. The toolkit offers features like prompt generation, image generation, dataset annotation, and tools for training small-scale neural networks for edge deployment. It provides hardware requirements, usage instructions, available models, and limitations to consider while using the library.

detoxify
Detoxify is a library that provides trained models and code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification. It includes models like 'original', 'unbiased', and 'multilingual' trained on different datasets to detect toxicity and minimize bias. The library aims to help in stopping harmful content online by interpreting visual content in context. Users can fine-tune the models on carefully constructed datasets for research purposes or to aid content moderators in flagging out harmful content quicker. The library is built to be user-friendly and straightforward to use.

call-center-ai
Call Center AI is an AI-powered call center solution that leverages Azure and OpenAI GPT. It is a proof of concept demonstrating the integration of Azure Communication Services, Azure Cognitive Services, and Azure OpenAI to build an automated call center solution. The project showcases features like accessing claims on a public website, customer conversation history, language change during conversation, bot interaction via phone number, multiple voice tones, lexicon understanding, todo list creation, customizable prompts, content filtering, GPT-4 Turbo for customer requests, specific data schema for claims, documentation database access, SMS report sending, conversation resumption, and more. The system architecture includes components like RAG AI Search, SMS gateway, call gateway, moderation, Cosmos DB, event broker, GPT-4 Turbo, Redis cache, translation service, and more. The tool can be deployed remotely using GitHub Actions and locally with prerequisites like Azure environment setup, configuration file creation, and resource hosting. Advanced usage includes custom training data with AI Search, prompt customization, language customization, moderation level customization, claim data schema customization, OpenAI compatible model usage for the LLM, and Twilio integration for SMS.

hordelib
horde-engine is a wrapper around ComfyUI designed to run inference pipelines visually designed in the ComfyUI GUI. It enables users to design inference pipelines in ComfyUI and then call them programmatically, maintaining compatibility with the existing horde implementation. The library provides features for processing Horde payloads, initializing the library, downloading and validating models, and generating images based on input data. It also includes custom nodes for preprocessing and tasks such as face restoration and QR code generation. The project depends on various open source projects and bundles some dependencies within the library itself. Users can design ComfyUI pipelines, convert them to the backend format, and run them using the run_image_pipeline() method in hordelib.comfy.Comfy(). The project is actively developed and tested using git, tox, and a specific model directory structure.

auto-playwright
Auto Playwright is a tool that allows users to run Playwright tests using AI. It eliminates the need for selectors by determining actions at runtime based on plain-text instructions. Users can automate complex scenarios, write tests concurrently with or before functionality development, and benefit from rapid test creation. The tool supports various Playwright actions and offers additional options for debugging and customization. It uses HTML sanitization to reduce costs and improve text quality when interacting with the OpenAI API.

trubrics-python
Trubrics is a Python client for event tracking and analyzing LLM interactions. It offers fast and non-blocking queuing system with automatic flushing to Trubrics API. Users can track events and LLM interactions, adjust logging verbosity, and configure flush intervals and batch sizes. The tool simplifies tracking user interactions and analyzing data for LLM applications.
For similar tasks

duckdb-airport-extension
The 'duckdb-airport-extension' is a tool that enables the use of Arrow Flight with DuckDB. It provides functions to list available Arrow Flights at a specific endpoint and to retrieve the contents of an Arrow Flight. The extension also supports creating secrets for authentication purposes. It includes features for serializing filters and optimizing projections to enhance data transmission efficiency. The tool is built on top of gRPC and the Arrow IPC format, offering high-performance data services for data processing and retrieval.
For similar jobs

lollms-webui
LoLLMs WebUI (Lord of Large Language Multimodal Systems: One tool to rule them all) is a user-friendly interface to access and utilize various LLM (Large Language Models) and other AI models for a wide range of tasks. With over 500 AI expert conditionings across diverse domains and more than 2500 fine tuned models over multiple domains, LoLLMs WebUI provides an immediate resource for any problem, from car repair to coding assistance, legal matters, medical diagnosis, entertainment, and more. The easy-to-use UI with light and dark mode options, integration with GitHub repository, support for different personalities, and features like thumb up/down rating, copy, edit, and remove messages, local database storage, search, export, and delete multiple discussions, make LoLLMs WebUI a powerful and versatile tool.

Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

minio
MinIO is a High Performance Object Storage released under GNU Affero General Public License v3.0. It is API compatible with Amazon S3 cloud storage service. Use MinIO to build high performance infrastructure for machine learning, analytics and application data workloads.

mage-ai
Mage is an open-source data pipeline tool for transforming and integrating data. It offers an easy developer experience, engineering best practices built-in, and data as a first-class citizen. Mage makes it easy to build, preview, and launch data pipelines, and provides observability and scaling capabilities. It supports data integrations, streaming pipelines, and dbt integration.

AiTreasureBox
AiTreasureBox is a versatile AI tool that provides a collection of pre-trained models and algorithms for various machine learning tasks. It simplifies the process of implementing AI solutions by offering ready-to-use components that can be easily integrated into projects. With AiTreasureBox, users can quickly prototype and deploy AI applications without the need for extensive knowledge in machine learning or deep learning. The tool covers a wide range of tasks such as image classification, text generation, sentiment analysis, object detection, and more. It is designed to be user-friendly and accessible to both beginners and experienced developers, making AI development more efficient and accessible to a wider audience.

tidb
TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

airbyte
Airbyte is an open-source data integration platform that makes it easy to move data from any source to any destination. With Airbyte, you can build and manage data pipelines without writing any code. Airbyte provides a library of pre-built connectors that make it easy to connect to popular data sources and destinations. You can also create your own connectors using Airbyte's no-code Connector Builder or low-code CDK. Airbyte is used by data engineers and analysts at companies of all sizes to build and manage their data pipelines.

labelbox-python
Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.