cloudflare-rag

cloudflare-rag

Fullstack "Chat with your PDFs" RAG (Retrieval Augmented Generation) app built fully on Cloudflare

Stars: 93

Visit
 screenshot

This repository provides a fullstack example of building a Retrieval Augmented Generation (RAG) app with Cloudflare. It utilizes Cloudflare Workers, Pages, D1, KV, R2, AI Gateway, and Workers AI. The app features streaming interactions to the UI, hybrid RAG with Full-Text Search and Vector Search, switchable providers using AI Gateway, per-IP rate limiting with Cloudflare's KV, OCR within Cloudflare Worker, and Smart Placement for workload optimization. The development setup requires Node, pnpm, and wrangler CLI, along with setting up necessary primitives and API keys. Deployment involves setting up secrets and deploying the app to Cloudflare Pages. The project implements a Hybrid Search RAG approach combining Full Text Search against D1 and Hybrid Search with embeddings against Vectorize to enhance context for the LLM.

README:

Fullstack Cloudflare RAG

This is a fullstack example of how to build a RAG (Retrieval Augmented Generation) app with Cloudflare. It uses Cloudflare Workers, Pages, D1, KV, R2, AI Gateway and Workers AI.

https://github.com/user-attachments/assets/cbaa0380-7ad6-448d-ad44-e83772a9cf3f

Demo

Deploy to Cloudflare Workers

Features:

  • Every interaction is streamed to the UI using Server-Sent Events
  • Hybrid RAG using Full-Text Search on D1 and Vector Search on Vectorize
  • Switchable between various providers (OpenAI, Groq, Anthropic) using AI Gateway with fallbacks
  • Per-IP Rate limiting using Cloudflare's KV
  • OCR is running inside Cloudflare Worker using unpdf
  • Smart Placement automatically places your workloads in an optimal location that minimizes latency and speeds up your applications

Development

Make sure you have Node, pnpm and wrangler CLI installed.

Install dependencies:

pnpm install # or npm install

Deploy necessary primitives:

./setup.sh

Then, in wrangler.toml, set the d1_databases.database_id to your D1 database id and kv_namespaces.rate_limiter to your rate limiter KV namespace id.

Then, create a .dev.vars file with your API keys:

CLOUDFLARE_ACCOUNT_ID=your-cloudflare-account-id # Required
GROQ_API_KEY=your-groq-api-key # Optional
OPENAI_API_KEY=your-openai-api-key # Optional
ANTHROPIC_API_KEY=your-anthropic-api-key # Optional

If you don't have these keys, /api/stream will fallback to Workers AI.

Run the dev server:

npm run dev

And access the app at http://localhost:5173/.

Deployment

Having the necessary primitives setup, first setup secrets:

npx wrangler secret put CLOUDFLARE_ACCOUNT_ID
npx wrangler secret put GROQ_API_KEY
npx wrangler secret put OPENAI_API_KEY
npx wrangler secret put ANTHROPIC_API_KEY

Then, deploy your app to Cloudflare Pages:

npm run deploy

Hybrid Search RAG

Hybrid Search RAG

This project uses a combination of classical Full Text Search (sparse) against Cloudflare D1 and Hybrid Search with embeddings against Vectorize (dense) to provide the best of both worlds providing the most applicable context to the LLM.

The way it works is this:

  1. We take user input and we rewrite it to 5 different queries using an LLM
  2. We run each of these queries against our both datastores - D1 database using BM25 for full-text search and Vectorize for dense retrieval
  3. We take the results from both datastores and we merge them together using Reciprocal Rank Fusion which provides us with a single list of results
  4. We then take the top 10 results from this list and we pass them to the LLM to generate a response

License

This project is licensed under the terms of the MIT License.

Consulting

If you need help in building AI applications, please reach out to me on Twitter or via my website. Happy to help!

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for cloudflare-rag

Similar Open Source Tools

For similar tasks

For similar jobs