Open_Data_QnA
The Open Data QnA python library enables you to chat with your databases by leveraging LLM Agents on Google Cloud. Open Data QnA enables a conversational approach to interacting with your data by implementing state-of-the-art NL2SQL / Text2SQL methods.
Stars: 127
Open Data QnA is a Python library that allows users to interact with their PostgreSQL or BigQuery databases in a conversational manner, without needing to write SQL queries. The library leverages Large Language Models (LLMs) to bridge the gap between human language and database queries, enabling users to ask questions in natural language and receive informative responses. It offers features such as conversational querying with multiturn support, table grouping, multi schema/dataset support, SQL generation, query refinement, natural language responses, visualizations, and extensibility. The library is built on a modular design and supports various components like Database Connectors, Vector Stores, and Agents for SQL generation, validation, debugging, descriptions, embeddings, responses, and visualizations.
README:
The Open Data QnA python library enables you to chat with your databases by leveraging LLM Agents on Google Cloud.
Open Data QnA enables a conversational approach to interacting with your data. Ask questions about your PostgreSQL or BigQuery databases in natural language and receive informative responses, without needing to write SQL. Open Data QnA leverages Large Language Models (LLMs) to bridge the gap between human language and database queries, streamlining data analysis and decision-making.
Key Features:
- Conversational Querying with Multiturn Support: Ask questions naturally, without requiring SQL knowledge and ask follow up questions.
- Table Grouping: Group tables under one usecase/user grouping name which can help filtering your large number tables for LLMs to understand about.
- Multi Schema/Dataset Support: You can group tables from different schemas/datasets for embedding and asking questions against.
- Prompt Customization and Additional Context: The prompts that are being used are loaded from a yaml file and it also give you ability to add extra context as well
- SQL Generation: Automatically generates SQL queries based on your questions.
- Query Refinement: Validates and debugs queries to ensure accuracy.
- Natural Language Responses: DRun queries and present results in clear, easy-to-understand language.
- Visualizations (Optional): Explore data visually with generated charts.
- Extensible: Customize and integrate with your existing workflows(API, UI, Notebooks).
It is built on a modular design and currently supports the following components:
- Google Cloud SQL for PostgreSQL
- Google BigQuery
- Google Firestore(for storing session logs)
- PGVector on Google Cloud SQL for PostgreSQL
- BigQuery Vector Store
- BuildSQLAgent: An agent specialized in generating SQL queries for BigQuery or PostgreSQL databases. It analyzes user questions, available table schemas, and column descriptions to construct syntactically and semantically correct SQL queries, adapting its process based on the target database type.
- ValidateSQLAgent: An agent that validates the syntax and semantic correctness of SQL queries. It uses a language model to analyze queries against a database schema and returns a JSON response indicating validity and potential errors.
- DebugSQLAgent: An agent designed to debug and refine SQL queries for BigQuery or PostgreSQL databases. It interacts with a chat-based language model to iteratively troubleshoot queries, using error messages to generate alternative, correct queries.
- DescriptionAgent: An agent specialized in generating descriptions for database tables and columns. It leverages a large language model to create concise and informative descriptions that aid in understanding data structures and facilitate SQL query generation.
- EmbedderAgent: An agent specialized in generating text embeddings using Large Language Models (LLMs). It supports direct interaction with Vertex AI's TextEmbeddingModel or uses LangChain's VertexAIEmbeddings for a simplified interface.
- ResponseAgent: An agent that generates natural language responses to user questions based on SQL query results. It acts as a data assistant, interpreting SQL results and transforming them into user-friendly answers using a language model.
- VisualizeAgent: An agent that generates JavaScript code for Google Charts based on user questions and SQL results. It suggests suitable chart types and constructs the JavaScript code to create visualizations of the data.
Note: the library was formerly named Talk2Data. You may still find artifacts with the old naming in this repository.
A detailed description of the Architecture can be found here
in the docs.
Details on the Repository Structure can be found here
in the docs.
git clone [email protected]:GoogleCloudPlatform/Open_Data_QnA.git
cd Open_Data_QnA
Make sure that Google Cloud CLI and Python >= 3.10 are installed before moving ahead! You can refer to the link below for guidance
Installation Guide: https://cloud.google.com/sdk/docs/install
Download Python: https://www.python.org/downloads/
βΉοΈ You can setup this solution with three approaches. Choose one based on your requirements:
- A) Using Jupyter Notebooks (For better view at what is happening at each stage of the solution)
- B) Using CLI (For ease of use and running with simple python commands, without the need to understand every step of the solution)
- C) Using terraform deployment including your backend APIs with UI
All commands in this cell to be run on the terminal (typically Ctrl+Shift+`) where your notebooks are running
Install the dependencies by running the poetry commands below
# Install poetry
pip uninstall poetry -y
pip install poetry --quiet
#Run the poetry commands below to set up the environment
poetry lock #resolve dependecies (also auto create poetry venv if not exists)
poetry install --quiet #installs dependencies
poetry env info #Displays the evn just created and the path to it
poetry shell #this command should activate your venv and you should see it enters into the venv
##inside the activated venv shell []
#If you are running on Worbench instance where the service account used has required permissions to run this solution you can skip the below gcloud auth commands and get to next kernel creation section
gcloud auth login # Use this or below command to authenticate
gcloud auth application-default login
gcloud services enable \
serviceusage.googleapis.com \
cloudresourcemanager.googleapis.com --project <<Enter Project Id>>
Chose the relevant instructions based on where you are running the notebook
For IDEs like Cloud Shell Editor, VS Code
For IDEs adding Juypter Extensions will automatically give you option to change the kernel. If not, manually select the python interpreter in your IDE (The exact is shown in the above cell. Path would look like e.g. /home/admin_/opendata/.venv/bin/python or ~cache/user/opendataqna/.venv/bin/python)
Proceed to the Step 1 below
For Jupyter Lab or Jupyter Environments on Workbench etc
Create Kernel for with the envrionment created
pip install jupyter
ipython kernel install --name "openqna-venv" --user
Restart your kernel or close the exsiting notebook and open again, you should now see the "openqna-venv" in the kernel drop down
What did we do here?
- Created Application Default Credentials to use for the code
- Added venv to kernel to select for running the notebooks (For standalone Jupyter setups like Workbench etc)
1. Run the 1_Setup_OpenDataQnA (Run Once for Initial Setup)
This notebook guides you through the setup and execution of the Open Data QnA application. It provides comprehensive instructions for setup the solution.
2. Run the 2_Run_OpenDataQnA
This notebook guides you by reading the configuration you setup with 1_Setup_OpenDataQnA and running the pipeline to answer questions about your data.
In case you want to separately load Known Good SQLs please run this notebook once the config variables are setup in config.ini file. This can be run multiple times just to load the known good sql queries and create embeddings for it.
1. Add Configuration values for the solution in config.ini
For setup we require details for vector store, source database etc. Edit the config.ini file and add values for the parameters based of below information.
βΉοΈ Follow the guidelines from the config guide document to populate your config.ini file.
Sources to connect
- This solution lets you setup multiple data source at same time.
- You can group multiple tables from different datasets or schema into a grouping and provide the details
- If your dataset/schema has many tables and you want to run the solution against few you should specifically choose a group for that tables only
Format for data_source_list.csv
source | user_grouping | schema | table
source - Supported Data Sources. #Options: bigquery , cloudsql-pg
user_grouping - Logical grouping or use case name for tables from same or different schema/dataset. When left black it default to the schema value in the next column
schema - schema name for postgres or dataset name in bigquery
table - name of the tables to run the solutions against. Leave this column blank after filling schema/dataset if you want to run solution for whole dataset/schema
Update the data_source_list.csv according for your requirement.
Note that the source details filled in the csv should have already be present. If not please use the Copy Notebooks if you want the demo source setup.
Enabled Data Sources:
- PostgreSQL on Google Cloud SQL (Copy Sample Data: 0_CopyDataToCloudSqlPG.ipynb)
- BigQuery (Copy Sample Data: 0_CopyDataToBigQuery.ipynb)
pip install poetry --quiet
poetry lock
poetry install --quiet
poetry env info
poetry shell
Authenticate your credentials
gcloud auth login
or
gcloud auth application-default login
gcloud services enable \
serviceusage.googleapis.com \
cloudresourcemanager.googleapis.com --project <<Enter Project Id>>
gcloud auth application-default set-quota-project <<Enter Project Id for using resources>>
Enable APIs for the solution setup
gcloud services enable \
cloudapis.googleapis.com \
compute.googleapis.com \
iam.googleapis.com \
run.googleapis.com \
sqladmin.googleapis.com \
aiplatform.googleapis.com \
bigquery.googleapis.com \
firestore.googleapis.com --project <<Enter Project Id>>
3. Run env_setup.py to create vector store based on the configuration you did in Step 1
python env_setup.py
4. Run opendataqna.py to run the pipeline you just setup
The Open Data QnA SQL Generation tool can be conveniently used from your terminal or command prompt using a simple CLI interface. Here's how:
python opendataqna.py --session_id "122133131f--ade-eweq" --user_question "What is most 5 common genres we have?" --user_grouping "MovieExplorer-bigquery"
Where
session_id : Keep this unique unique same for follow up questions.
user_question : Enter your question in string
user_grouping : Enter the BQ_DATASET_NAME for BigQuery sources or PG_SCHEMA for PostgreSQL sources (refer your data_source_list.csv file)
Optional Parameters
You can customize the pipeline's behavior using optional parameters. Here are some common examples:
# Enable the SQL debugger:
python opendataqna.py --session_id="..." --user_question "..." --user_grouping "..." --run_debugger
# Execute the final generated SQL:
python opendataqna.py --session_id="..." --user_question "..." --user_grouping "..." --execute_final_sql
# Change the number of debugging rounds:
python opendataqna.py --session_id="..." --user_question "..." --user_grouping "..." --debugging_rounds 5
# Adjust similarity thresholds:
python opendataqna.py --session_id="..." --user_question "..." --user_grouping "..." --table_similarity_threshold 0.25 --column_similarity_threshold 0.4
You can find a full list of available options and their descriptions by running:
python opendataqna.py --help
The provided terraform streamlines the setup of this solution and serves as a blueprint for deployment. The script provides a one-click, one-time deployment option. However, it doesn't include CI/CD capabilities and is intended solely for initial setup.
[!NOTE] Current version of the Terraform Google Cloud provider does not support deployment of a few resources, this solution uses null_resource to create those resources using Google Cloud SDK.
Prior to executing terraform, ensure that the below mentioned steps have been completed.
- Source data should already be available. If you do not have readily available source data, use the notebooks 0_CopyDataToBigQuery.ipynb or 0_CopyDataToCloudSqlPG.ipynb based on the preferred source to populate sample data.
- Ensure that the data_source_list.csv is populated with the list of datasources to be used in this solution. Terraform will take care of creating the embeddings in the destination. Use data_source_list_sample.csv to fill the data_source_list.csv
- If you want to use known good sqls for few shot prompting, ensure that the known_good_sql.csv is populated with the required data. Terraform will take care of creating the embeddings in the destination.
Firebase will be used to host the frontend of the application.
- Go to https://console.firebase.google.com/
- Select add project and load your Google Cloud Platform project
- Add Firebase to one of your existing Google Cloud projects
- Confirm Firebase billing plan
- Continue and complete
[!NOTE]
Terraform apply command for this application uses gcloud config to fetch & pass the set project id to the scripts. Please ensure that gcloud config has been set to your intended project id before proceeding.
[!IMPORTANT]
The Terraform scripts require specific IAM permissions to function correctly. The user needs either the broadroles/resourcemanager.projectIamAdmin
role or a custom role with tailored permissions to manage IAM policies and roles. Additionally, one script TEMPORARILY disables Domain Restricted Sharing Org Policies to enable the creation of a public endpoint. This requires the user to also have theroles/orgpolicy.policyAdmin
role.
- Install terraform 1.7 or higher.
- [OPTIONAL] Update default values of variables in variables.tf according to your preferences. You can find the description for each variable inside the file. This file will be used by terraform to get information about the resources it needs to deploy. If you do not update these, terraform will use the already specified default values in the file.
- Move to the terraform directory in the terminal
cd Open_Data_QnA/terraform
#If you are running this outside Cloud Shell you need to set up your Google Cloud SDK Credentials
gcloud config set project <your_project_id>
gcloud auth application-default set-quota-project <your_project_id>
gcloud services enable \
serviceusage.googleapis.com \
cloudresourcemanager.googleapis.com --project <<Enter Project Id>>
sh ./scripts/deploy-all.sh
This script will perform the following steps:
- Run terraform scripts - These terraform scripts will generate all the GCP resources and configuration files required for the frontend & backend. It will also generate embeddings and store it in the destination vector db.
- Deploy cloud run backend service with latest backend code - The terraform in the previous step uses a dummy container image to deploy the initial version of cloud run service. This is the step where the actual backend code gets deployed.
- Deploy frontend app - All the config files, web app etc required to create the frontend are deployed via terraform. However, the actual UI deployment takes place in this step.
Auth Provider
You need to enable at least one authentication provider in Firebase, you can enable it using the following steps:
- Go to https://console.firebase.google.com/project/your_project_id/authentication/providers (change the
your_project_id
value) - Click on Get Started (if needed)
- Select Google and enable it
- Set the name for the project and support email for project
- Save
This should deploy you end to end solution in the project with firebase web url
For detailed steps and known issues refer to README.md under /terraform
Deploy backend apis for the solution, refer to the README.md under /backend-apis
. This APIs are designed with work with the frontend and provide access to run the solution.
Once the backend APIs deployed successfully deploy the frontend for the solution, refer to the README.md under /frontend
.
If you successfully set up the solution accelerator and want to start optimizing to your needs, you can follow the tips in the Best Practice doc
.
Additionally, if you stumble across any problems, take a look into the FAQ
.
If neither of these resources helps, feel free to reach out to us directly by raising an Issue.
To clean up the resources provisioned in this solution, use commands below to remove them using gcloud/bq:
For cloudsql-pgvector as vector store : Delete SQL Instance
gcloud sql instances delete <CloudSQL Instance Name> -q
Delete BigQuery Dataset Created for Logs and Vector Store : Remove BQ Dataset
bq rm -r -f -d <BigQuery Dataset Name for OpenDataQnA>
(For Backend APIs)Remove the Cloud Run service : Delete Service
gcloud run services delete <Cloud Run Service Name>
For frontend, based on firebase: Remove the firebase app
BigQuery quotas including hardware, software, and network components.
Open Data QnA is distributed with the Apache-2.0 license.
It also contains code derived from the following third-party packages:
This repository provides an open-source solution accelerator designed to streamline your development process. Please be aware that all resources associated with this accelerator will be deployed within your own Google Cloud Platform (GCP) instances.
It is imperative that you thoroughly test all components and configurations in a non-production environment before integrating any part of this accelerator with your production data or systems.
While we strive to provide a secure and reliable solution, we cannot be held responsible for any data loss, service disruptions, or other issues that may arise from the use of this accelerator.
By utilizing this repository, you acknowledge that you are solely responsible for the deployment, management, and security of the resources deployed within your GCP environment.
If you encounter any issues or have concerns about potential risks, please refrain from using this accelerator in a production setting.
We encourage responsible and informed use of this open-source solution.
If you have any questions or if you found any problems with this repository, please report through GitHub issues.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Open_Data_QnA
Similar Open Source Tools
Open_Data_QnA
Open Data QnA is a Python library that allows users to interact with their PostgreSQL or BigQuery databases in a conversational manner, without needing to write SQL queries. The library leverages Large Language Models (LLMs) to bridge the gap between human language and database queries, enabling users to ask questions in natural language and receive informative responses. It offers features such as conversational querying with multiturn support, table grouping, multi schema/dataset support, SQL generation, query refinement, natural language responses, visualizations, and extensibility. The library is built on a modular design and supports various components like Database Connectors, Vector Stores, and Agents for SQL generation, validation, debugging, descriptions, embeddings, responses, and visualizations.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
crawlee-python
Crawlee-python is a web scraping and browser automation library that covers crawling and scraping end-to-end, helping users build reliable scrapers fast. It allows users to crawl the web for links, scrape data, and store it in machine-readable formats without worrying about technical details. With rich configuration options, users can customize almost any aspect of Crawlee to suit their project's needs.
testzeus-hercules
Hercules is the worldβs first open-source testing agent designed to handle the toughest testing tasks for modern web applications. It turns simple Gherkin steps into fully automated end-to-end tests, making testing simple, reliable, and efficient. Hercules adapts to various platforms like Salesforce and is suitable for CI/CD pipelines. It aims to democratize and disrupt test automation, making top-tier testing accessible to everyone. The tool is transparent, reliable, and community-driven, empowering teams to deliver better software. Hercules offers multiple ways to get started, including using PyPI package, Docker, or building and running from source code. It supports various AI models, provides detailed installation and usage instructions, and integrates with Nuclei for security testing and WCAG for accessibility testing. The tool is production-ready, open core, and open source, with plans for enhanced LLM support, advanced tooling, improved DOM distillation, community contributions, extensive documentation, and a bounty program.
litlytics
LitLytics is an affordable analytics platform leveraging LLMs for automated data analysis. It simplifies analytics for teams without data scientists, generates custom pipelines, and allows customization. Cost-efficient with low data processing costs. Scalable and flexible, works with CSV, PDF, and plain text data formats.
devika
Devika is an advanced AI software engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. Devika utilizes large language models, planning and reasoning algorithms, and web browsing abilities to intelligently develop software. Devika aims to revolutionize the way we build software by providing an AI pair programmer who can take on complex coding tasks with minimal human guidance. Whether you need to create a new feature, fix a bug, or develop an entire project from scratch, Devika is here to assist you.
burpference
Burpference is an open-source extension designed to capture in-scope HTTP requests and responses from Burp's proxy history and send them to a remote LLM API in JSON format. It automates response capture, integrates with APIs, optimizes resource usage, provides color-coded findings visualization, offers comprehensive logging, supports native Burp reporting, and allows flexible configuration. Users can customize system prompts, API keys, and remote hosts, and host models locally to prevent high inference costs. The tool is ideal for offensive web application engagements to surface findings and vulnerabilities.
OpenCopilot
OpenCopilot allows you to have your own product's AI copilot. It integrates with your underlying APIs and can execute API calls whenever needed. It uses LLMs to determine if the user's request requires calling an API endpoint. Then, it decides which endpoint to call and passes the appropriate payload based on the given API definition.
copilot
OpenCopilot is a tool that allows users to create their own AI copilot for their products. It integrates with APIs to execute calls as needed, using LLMs to determine the appropriate endpoint and payload. Users can define API actions, validate schemas, and integrate a user-friendly chat bubble into their SaaS app. The tool is capable of calling APIs, transforming responses, and populating request fields based on context. It is not suitable for handling large APIs without JSON transformers. Users can teach the copilot via flows and embed it in their app with minimal code.
serverless-chat-langchainjs
This sample shows how to build a serverless chat experience with Retrieval-Augmented Generation using LangChain.js and Azure. The application is hosted on Azure Static Web Apps and Azure Functions, with Azure Cosmos DB for MongoDB vCore as the vector database. You can use it as a starting point for building more complex AI applications.
CyberScraper-2077
CyberScraper 2077 is an advanced web scraping tool powered by AI, designed to extract data from websites with precision and style. It offers a user-friendly interface, supports multiple data export formats, operates in stealth mode to avoid detection, and promises lightning-fast scraping. The tool respects ethical scraping practices, including robots.txt and site policies. With upcoming features like proxy support and page navigation, CyberScraper 2077 is a futuristic solution for data extraction in the digital realm.
cognita
Cognita is an open-source framework to organize your RAG codebase along with a frontend to play around with different RAG customizations. It provides a simple way to organize your codebase so that it becomes easy to test it locally while also being able to deploy it in a production ready environment. The key issues that arise while productionizing RAG system from a Jupyter Notebook are: 1. **Chunking and Embedding Job** : The chunking and embedding code usually needs to be abstracted out and deployed as a job. Sometimes the job will need to run on a schedule or be trigerred via an event to keep the data updated. 2. **Query Service** : The code that generates the answer from the query needs to be wrapped up in a api server like FastAPI and should be deployed as a service. This service should be able to handle multiple queries at the same time and also autoscale with higher traffic. 3. **LLM / Embedding Model Deployment** : Often times, if we are using open-source models, we load the model in the Jupyter notebook. This will need to be hosted as a separate service in production and model will need to be called as an API. 4. **Vector DB deployment** : Most testing happens on vector DBs in memory or on disk. However, in production, the DBs need to be deployed in a more scalable and reliable way. Cognita makes it really easy to customize and experiment everything about a RAG system and still be able to deploy it in a good way. It also ships with a UI that makes it easier to try out different RAG configurations and see the results in real time. You can use it locally or with/without using any Truefoundry components. However, using Truefoundry components makes it easier to test different models and deploy the system in a scalable way. Cognita allows you to host multiple RAG systems using one app. ### Advantages of using Cognita are: 1. A central reusable repository of parsers, loaders, embedders and retrievers. 2. Ability for non-technical users to play with UI - Upload documents and perform QnA using modules built by the development team. 3. Fully API driven - which allows integration with other systems. > If you use Cognita with Truefoundry AI Gateway, you can get logging, metrics and feedback mechanism for your user queries. ### Features: 1. Support for multiple document retrievers that use `Similarity Search`, `Query Decompostion`, `Document Reranking`, etc 2. Support for SOTA OpenSource embeddings and reranking from `mixedbread-ai` 3. Support for using LLMs using `Ollama` 4. Support for incremental indexing that ingests entire documents in batches (reduces compute burden), keeps track of already indexed documents and prevents re-indexing of those docs.
raggenie
RAGGENIE is a low-code RAG builder tool designed to simplify the creation of conversational AI applications. It offers out-of-the-box plugins for connecting to various data sources and building conversational AI on top of them, including integration with pre-built agents for actions. The tool is open-source under the MIT license, with a current focus on making it easy to build RAG applications and future plans for maintenance, monitoring, and transitioning applications from pilots to production.
open-repo-wiki
OpenRepoWiki is a tool designed to automatically generate a comprehensive wiki page for any GitHub repository. It simplifies the process of understanding the purpose, functionality, and core components of a repository by analyzing its code structure, identifying key files and functions, and providing explanations. The tool aims to assist individuals who want to learn how to build various projects by providing a summarized overview of the repository's contents. OpenRepoWiki requires certain dependencies such as Google AI Studio or Deepseek API Key, PostgreSQL for storing repository information, Github API Key for accessing repository data, and Amazon S3 for optional usage. Users can configure the tool by setting up environment variables, installing dependencies, building the server, and running the application. It is recommended to consider the token usage and opt for cost-effective options when utilizing the tool.
conversational-agent-langchain
This repository contains a Rest-Backend for a Conversational Agent that allows embedding documents, semantic search, QA based on documents, and document processing with Large Language Models. It uses Aleph Alpha and OpenAI Large Language Models to generate responses to user queries, includes a vector database, and provides a REST API built with FastAPI. The project also features semantic search, secret management for API keys, installation instructions, and development guidelines for both backend and frontend components.
Perplexica
Perplexica is an open-source AI-powered search engine that utilizes advanced machine learning algorithms to provide clear answers with sources cited. It offers various modes like Copilot Mode, Normal Mode, and Focus Modes for specific types of questions. Perplexica ensures up-to-date information by using SearxNG metasearch engine. It also features image and video search capabilities and upcoming features include finalizing Copilot Mode and adding Discover and History Saving features.
For similar tasks
Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customerβs subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.
sorrentum
Sorrentum is an open-source project that aims to combine open-source development, startups, and brilliant students to build machine learning, AI, and Web3 / DeFi protocols geared towards finance and economics. The project provides opportunities for internships, research assistantships, and development grants, as well as the chance to work on cutting-edge problems, learn about startups, write academic papers, and get internships and full-time positions at companies working on Sorrentum applications.
tidb
TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.
zep-python
Zep is an open-source platform for building and deploying large language model (LLM) applications. It provides a suite of tools and services that make it easy to integrate LLMs into your applications, including chat history memory, embedding, vector search, and data enrichment. Zep is designed to be scalable, reliable, and easy to use, making it a great choice for developers who want to build LLM-powered applications quickly and easily.
telemetry-airflow
This repository codifies the Airflow cluster that is deployed at workflow.telemetry.mozilla.org (behind SSO) and commonly referred to as "WTMO" or simply "Airflow". Some links relevant to users and developers of WTMO: * The `dags` directory in this repository contains some custom DAG definitions * Many of the DAGs registered with WTMO don't live in this repository, but are instead generated from ETL task definitions in bigquery-etl * The Data SRE team maintains a WTMO Developer Guide (behind SSO)
mojo
Mojo is a new programming language that bridges the gap between research and production by combining Python syntax and ecosystem with systems programming and metaprogramming features. Mojo is still young, but it is designed to become a superset of Python over time.
pandas-ai
PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.
databend
Databend is an open-source cloud data warehouse that serves as a cost-effective alternative to Snowflake. With its focus on fast query execution and data ingestion, it's designed for complex analysis of the world's largest datasets.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.