Sarvadnya

Sarvadnya

This repo is a collection of various PoCs (Proof-of-Concepts) to interface custom data using LLMs.

Stars: 52

Visit
 screenshot

Sarvadnya is a repository focused on interfacing custom data using Large Language Models (LLMs) through Proof-of-Concepts (PoCs) like Retrieval Augmented Generation (RAG) and Fine-Tuning. It aims to enable domain adaptation for LLMs to answer on user-specific corpora. The repository also covers topics such as Indic-languages models, 3D World Simulations, Knowledge Graphs Generation, Signal Processing, Drones, UAV Image Processing, and Floor Plan Segmentation. It provides insights into building chatbots of various modalities, preparing videos, and creating content for different platforms like Medium, LinkedIn, and YouTube. The tech stacks involved range from enterprise solutions like Google Doc AI and Microsoft Azure Language AI Services to open-source tools like Langchain and HuggingFace.

README:

Sarvadnya (सर्वज्ञ), an All-Knowing Chatbot!!

Chatbots can be real WoW!! The recent evidence is: ChatGPT. Now that they are more human-like with the latest LLMs (Large Language Models). But these LLMs are Pretrained on their own (HUGE) data. Mere mortals don't have any ways ($$, time, expertise) to train own LLMs. RAG and/or Fine-tuning is the way out for Domain Adaptation ie. LLMs answering on your corpus. This repo is a collection of various PoCs (Proof-of-Concepts) to interface custom data using LLMs.

A few other topics are (or can be) part of this repo is to build

  • Indic-languages models, some notes here
  • 3D World Simulations, Agents, some notes here
  • Knowledge Graphs Generation, some notes here
  • Signal Processing, some notes here
  • Drones, UAV Image Processing, Shynakshi here
  • Floor Plan Segmentation here

What?

PoCs Projects

  • Prep chatbots of various modalities, use cases and domains, diff datasets
  • Prep videos, write Medium Posts (GDE/TH), LinkedIn posts, Youtube channel

Modes

  • Retrieval Augmented Generation (RAG) on own data
  • Fine-tuning LLMs with own data using LoRA etc

RAG

  • When?: {less, streaming, private} data and less {compute, money, expertise}
  • What?:
    • on knowledge graphs, more grounding
    • tabular financial data, representation and similarity
    • midcurveNN Geometric serialization and retrieval
    • active loop idea of fine-tuning your data
    • Langchain and Llamaindex with any new LLM

Fine-Tuning

  • When? Sufficient curated date is available, not a whole lot though, in a batch (not running) state

  • What: Instead of unstructured text (input prompts) to unstructured text (output response), more value is in prompt to structured output, such as :

    • text2json: many enterprises such as financial companies.
    • text2cypher: for graph databases, from Neo4j, like Langchain implementation by Tomaz Britanic
    • text2SQL: classical case, many pro solutions available, study them, follow them, for other QLs
    • text2Manim: Maths Animation, dataset available, see if generated video can be shown in the same streamlit page
    • text23DJS: Good for 3D+LLM+Agents like Metamorph from Nvidia, Geometry or shape representation as text, is the key
    • textGraph2textGraph: MidcurveNN if we get Graph representation as text, right.
  • Here, key would be robust post-processing and evaluation as the response needs to be near perfect, no scope of relaxation even in syntax or format.

Tech Stacks

  • Enterprise: Google Doc AI, Vertex AI, Microsoft Azure Language AI Services
  • Open Source: Langchain (Serve/Smith/Graph), HuggingFace, Streamlit for UI

Bottom-line

  • Not looking for Success, but Wonder!!
  • तमसो मा ज्योतिर्गमय : From Dark (hidden in text data) to Light (insights)

Folks to Follow

Publications so far

References

Disclaimer:

Author ([email protected]) gives no guarantee of the results of the program. It is just a fun script. Lot of improvements are still to be made. So, don’t depend on it at all.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for Sarvadnya

Similar Open Source Tools

For similar tasks

For similar jobs