project-lakechain

project-lakechain

:zap: Cloud-native, AI-powered, document processing pipelines on AWS.

Stars: 109

Visit
 screenshot

Project Lakechain is a cloud-native, AI-powered framework for building document processing pipelines on AWS. It provides a composable API with built-in middlewares for common tasks, scalable architecture, cost efficiency, GPU and CPU support, and the ability to create custom transform middlewares. With ready-made examples and emphasis on modularity, Lakechain simplifies the deployment of scalable document pipelines for tasks like metadata extraction, NLP analysis, text summarization, translations, audio transcriptions, computer vision, and more.

README:





Project Lakechain  Static Badge

Cloud-native, AI-powered, document processing pipelines on AWS.

Github Codespaces


🔖 Features

  • 🤖 Composable — Composable API to express document processing pipelines using middlewares.
  • ☁️ Scalable — Scales out-of-the box. Process millions of documents, scale to zero automatically when done.
  • Cost Efficient — Uses cost-optimized architectures to reduce costs and drive a pay-as-you-go model.
  • 🚀 Ready to use60+ built-in middlewares for common document processing tasks, ready to be deployed.
  • 🦎 GPU and CPU Support — Use the right compute type to balance between performance and cost.
  • 📦 Bring Your Own — Create your own transform middlewares to process documents and extend Lakechain.
  • 📙 Ready Made Examples - Quickstart your journey by leveraging 50+ examples we've built for you.

🚀 Getting Started

👉 Head to our documentation which contains all the information required to understand the project, and quickly start building!

What's Lakechain ❓

Project Lakechain is an experimental framework based on the AWS Cloud Development Kit (CDK) that makes it easy to express and deploy scalable document processing pipelines on AWS using infrastructure-as-code. It emphasizes on modularity of pipelines, and provides 40+ ready to use components for prototyping complex document pipelines that can scale out of the box to millions of documents.

This project has been designed to help AWS customers build and scale different types of document processing pipelines, ranging a wide array of use-cases including metadata extraction, document conversion, NLP analysis, text summarization, translations, audio transcriptions, computer vision, Retrieval Augmented Generation pipelines, and much more!

Show me the code ❗

👇 Below is an example of a pipeline that deploys the AWS infrastructure to automatically transcribe audio files uploaded to S3, in just a few lines of code. Scales to millions of documents.





LICENSE

See LICENSE.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for project-lakechain

Similar Open Source Tools

For similar tasks

For similar jobs