awesome-azure-openai-llm
"Awesome-LLM: a curated list of Azure OpenAI & Large Language Models" 🔎References to Azure OpenAI, 🦙Large Language Models, and related 🌌 services and 🎋libraries.
Stars: 316
README:
This repository contains references to Azure OpenAI, Large Language Models (LLM), and related services and libraries. It follows a similar approach to the ‘Awesome-list’.
🔹Brief each item on a few lines as possible.
🔹The dates are determined by the date of the commit history, the Article published date, or the Paper issued date (v1).
🔹Capturing a chronicle and key terms of that rapidly advancing field.
🔹Disclaimer: Please be aware that some content may be outdated.
- OpenAI offers the latest features and models, while Azure OpenAI provides a reliable, secure, and compliant environment with seamless integration into other Azure services.
- Azure OpenAI supports
private networking
,role-based authentication
, andresponsible AI content filtering
. - Azure OpenAI does not use user input as training data for other customers. Data, privacy, and security for Azure OpenAI
- What is Azure OpenAI Service?
- Open AI Models
- Abuse Monitoring: To detect and mitigate abuse, Azure OpenAI stores all prompts and generated content securely for up to thirty (30) days. (No prompts or completions are stored if the customer chooses to turn off abuse monitoring.)
- Section 1 : RAG, LlamaIndex, and Vector Storage
- Section 2 : Azure OpenAI and Reference Architecture
- Section 3 : Microsoft Semantic Kernel and Stanford NLP DSPy
- Section 4 : LangChain: Features, Usage, and Comparisons
-
Section 5 : Prompt Engineering, Finetuning, and Visual Prompts
- 1.Prompt Engineering
- Prompt Engineering & Prompt Guide
- 2.Finetuning
- Advanced Finetuning: PEFT (e.g., LoRA), RLHF, SFT
- Quantization, Pruning and Sparsification
- Knowledge Distillations and Memory Optimization
- Other techniques and LLM patterns: e.g., MoE
- 3.Visual Prompting & Visual Grounding
- Visual Prompting & Visual Grounding?
- Section 6 : Challenges and Abilities
-
Section 7 : Landscape of Large Language Models
- Large Language Models and NLP: Taxonomy
- OSS Large Language Models
- LLM for Domain specific: e.g., Software development
- MLLM (Multimodal large language model)
- Generative AI Landscape
- Section 8 : Survey and Reference
- Section 9 : Agents, Applications, and Frameworks
- Section 10 : General AI Tools and Extensions
- Section 11 : Datasets for Large Language Model Training
- Section 12 : Evaluating Large Language Models
-
Contributors
- Contributors: 👀
-
Symbols
-
ref
: external URL -
doc
: archived doc -
cite
: the source of comments -
cnt
: number of citations -
git
: GitHub link -
X-ref
: Cross reference
-
-
RAG (Retrieval-Augmented Generation) : Integrates the retrieval (searching) into LLM text generation. RAG helps the model to “look up” external information to improve its responses. cite [25 Aug 2023]
-
In a 2020 paper, Meta (Facebook) came up with a framework called retrieval-augmented generation to give LLMs access to information beyond their training data. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks: [cnt] [22 May 2020]
- RAG-sequence — We retrieve k documents, and use them to generate all the output tokens that answer a user query.
- RAG-token— We retrieve k documents, use them to generate the next token, then retrieve k more documents, use them to generate the next token, and so on. This means that we could end up retrieving several different sets of documents in the generation of a single answer to a user’s query.
- Of the two approaches proposed in the paper, the RAG-sequence implementation is pretty much always used in the industry. It’s cheaper and simpler to run than the alternative, and it produces great results. cite [30 Sep 2023]
-
RAG for LLMs: [cnt] 🏆Retrieval-Augmented Generation for Large Language Models: A Survey:
Three paradigms of RAG Naive RAG > Advanced RAG > Modular RAG
-
Expand: Research Papers
-
Benchmarking Large Language Models in Retrieval-Augmented Generation: [cnt]: Retrieval-Augmented Generation Benchmark (RGB) is proposed to assess LLMs on 4 key abilities [4 Sep 2023]:
-
Expand
1. Noise robustness (External documents contain noises, struggled with noise above 80%) 1. Negative rejection (External documents are all noises, Highest rejection rate was only 45%) 1. Information integration (Difficulty in summarizing across multiple documents, Highest accuracy was 60-67%) 1. Counterfactual robustness (Failed to detect factual errors in counterfactual external documents.)
-
-
Active Retrieval Augmented Generation : [cnt]: Forward-Looking Active REtrieval augmented generation (FLARE): FLARE iteratively generates a temporary next sentence and check whether it contains low-probability tokens. If so, the system retrieves relevant documents and regenerates the sentence. Determine low-probability tokens by
token_logprobs in OpenAI API response
. git [11 May 2023] -
Self-RAG: [cnt] 1.
Critic model C
: Generates reflection tokens (IsREL (relevant,irrelevant), IsSUP (fullysupported,partially supported,nosupport), IsUse (is useful: 5,4,3,2,1)). It is pretrained on data labeled by GPT-4. 2.Generator model M
: The main language model that generates task outputs and reflection tokens. It leverages the data labeled by the critic model during training. 3.Retriever model R
: Retrieves relevant passages. The LM decides if external passages (retriever) are needed for text generation. git [17 Oct 2023] - A Survey on Retrieval-Augmented Text Generation: [cnt]: This paper conducts a survey on retrieval-augmented text generation, highlighting its advantages and state-of-the-art performance in many NLP tasks. These tasks include Dialogue response generation, Machine translation, Summarization, Paraphrase generation, Text style transfer, and Data-to-text generation. [2 Feb 2022]
- Retrieval meets Long Context LLMs: [cnt]: We demonstrate that retrieval-augmentation significantly improves the performance of 4K context LLMs. Perhaps surprisingly, we find this simple retrieval-augmented baseline can perform comparable to 16K long context LLMs. [4 Oct 2023]
- FreshLLMs: [cnt]: Fresh Prompt, Google search first, then use results in prompt. Our experiments show that FreshPrompt outperforms both competing search engine-augmented prompting methods such as Self-Ask (Press et al., 2022) as well as commercial systems such as Perplexity.AI. git [5 Oct 2023]
-
RECOMP: Improving Retrieval-Augmented LMs with Compressors: [cnt]: 1. We propose RECOMP (Retrieve, Compress, Prepend), an intermediate step which compresses retrieved documents into a textual summary prior to prepending them to improve retrieval-augmented language models (RALMs). 2. We present two compressors – an
extractive compressor
which selects useful sentences from retrieved documents and anabstractive compressor
which generates summaries by synthesizing information from multiple documents. 3. Both compressors are trained. [6 Oct 2023] -
Retrieval-Augmentation for Long-form Question Answering: [cnt]: 1. The order of evidence documents affects the order of generated answers 2. the last sentence of the answer is more likely to be unsupported by evidence. 3. Automatic methods for detecting attribution can achieve reasonable performance, but still lag behind human agreement.
Attribution in the paper assesses how well answers are based on provided evidence and avoid creating non-existent information.
[18 Oct 2023] - INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning: INTERS covers 21 search tasks across three categories: query understanding, document understanding, and query-document relationship understanding. The dataset is designed for instruction tuning, a method that fine-tunes LLMs on natural language instructions. git [12 Jan 2024]
- RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture. [16 Jan 2024]
- The Power of Noise: Redefining Retrieval for RAG Systems: No more than 2-5 relevant docs + some amount of random noise to the LLM context maximizes the accuracy of the RAG. [26 Jan 2024]
- Corrective Retrieval Augmented Generation (CRAG): Retrieval Evaluator assesses the retrieved documents and categorizes them as Correct, Ambiguous, or Incorrect. For Ambiguous and Incorrect documents, the method uses Web Search to improve the quality of the information. The refined and distilled documents are then used to generate the final output. [29 Jan 2024] CRAG implementation by LangGraph git
-
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval: Introduce a novel approach to retrieval-augmented language models by constructing a recursive tree structure from documents. git
pip install llama-index-packs-raptor
/ git [31 Jan 2024] - CRAG: Comprehensive RAG Benchmark: a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search ref [7 Jun 2024]
- PlanRAG: Decision Making. Decision QA benchmark, DQA. Plan -> Retrieve -> Make a decision (PlanRAG) git [18 Jun 2024]
-
Searching for Best Practices in Retrieval-Augmented Generation:
Best Performance Practice
: Query Classification, Hybrid with HyDE (retrieval), monoT5 (reranking), Reverse (repacking), Recomp (summarization).Balanced Efficiency Practice
: Query Classification, Hybrid (retrieval), TILDEv2 (reranking), Reverse (repacking), Recomp (summarization). [1 Jul 2024] - Retrieval Augmented Generation or Long-Context LLMs?: Long-Context consistently outperforms RAG in terms of average performance. However, RAG's significantly lower cost remains a distinct advantage. [23 Jul 2024]
- Graph Retrieval-Augmented Generation: A Survey [15 Aug 2024]
- Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity git [21 Mar 2024]
- OP-RAG: Order-preserve RAG: Unlike traditional RAG, which sorts retrieved chunks by relevance, we keep them in their original order from the text. [3 Sep 2024]
-
Retrieval Augmented Generation (RAG) and Beyond:🏆The paper classifies user queries into four levels—
explicit, implicit, interpretable rationale, and hidden rationale
—and highlights the need for external data integration and fine-tuning LLMs for specialized tasks. [23 Sep 2024]
-
Benchmarking Large Language Models in Retrieval-Augmented Generation: [cnt]: Retrieval-Augmented Generation Benchmark (RGB) is proposed to assess LLMs on 4 key abilities [4 Sep 2023]:
- RAG Pipeline
- Indexing Stage: Preparing a knowledge base.
- Querying Stage: Querying the indexed data to retrieve relevant information.
- Responding Stage: Generating responses based on the retrieved information. ref
- How to optimize RAG pipeline: Indexing optimization [24 Oct 2023]
- Advanced RAG Patterns: How to improve RAG peformance ref / ref [17 Oct 2023]
- Data quality: Clean, standardize, deduplicate, segment, annotate, augment, and update data to make it clear, consistent, and context-rich.
- Embeddings fine-tuning: Fine-tune embeddings to domain specifics, adjust them according to context, and refresh them periodically to capture evolving semantics.
- Retrieval optimization: Refine chunking, embed metadata, use query routing, multi-vector retrieval, re-ranking, hybrid search, recursive retrieval, query engine, HyDE [20 Dec 2022], and vector search algorithms to improve retrieval efficiency and relevance.
- Synthesis techniques: Query transformations, prompt templating, prompt conditioning, function calling, and fine-tuning the generator to refine the generation step.
- HyDE: Implemented in LangChain: HypotheticalDocumentEmbedder. A query generates hypothetical documents, which are then embedded and retrieved to provide the most relevant results.
query -> generate n hypothetical documents -> documents embedding - (avg of embeddings) -> retrieve -> final result.
ref
- Demystifying Advanced RAG Pipelines: An LLM-powered advanced RAG pipeline built from scratch git [19 Oct 2023]
- 9 Effective Techniques To Boost Retrieval Augmented Generation (RAG) Systems doc: ReRank, Prompt Compression, Hypothetical Document Embedding (HyDE), Query Rewrite and Expansion, Enhance Data Quality, Optimize Index Structure, Add Metadata, Align Query with Documents, Mixed Retrieval (Hybrid Search) [2 Jan 2024]
-
cite [7 Nov 2023]
OpenAI has put together a pretty good roadmap for building a production RAG system.
Naive RAG -> Tune Chunks -> Rerank & Classify -> Prompt Engineering. Inllama_index
... Youtube
- Graph RAG (by NebulaGraph): NebulaGraph proposes the concept of Graph RAG, which is a retrieval enhancement technique based on knowledge graphs. demo [8 Sep 2023]
-
Evaluation with Ragas: UMAP (often used to reduce the dimensionality of embeddings) with Ragas metrics for visualizing RAG results. [Mar 2024] /
Ragas provides metrics
: Context Precision, Context Relevancy, Context Recall, Faithfulness, Answer Relevance, Answer Semantic Similarity, Answer Correctness, Aspect Critique git [May 2023]
- The Problem with RAG
- A question is not semantically similar to its answers. Cosine similarity may favor semantically similar texts that do not contain the answer.
- Semantic similarity gets diluted if the document is too long. Cosine similarity may favor short documents with only the relevant information.
- The information needs to be contained in one or a few documents. Information that requires aggregations by scanning the whole data.
- Seven Failure Points When Engineering a Retrieval Augmented Generation System: 1. Missing Content, 2. Missed the Top Ranked Documents, 3. Not in Context, 4. Not Extracted, 5. Wrong Format, 6. Incorrect Specificity, 7. Lack of Thorough Testing [11 Jan 2024]
- Solving the core challenges of Retrieval-Augmented Generation ref [Feb 2024]
- RAG Solution Design
-
Azure: Designing and developing a RAG solution
- Announcing cost-effective RAG at scale with Azure AI Search
- Advanced RAG with Azure AI Search and LlamaIndex
- GPT-RAG: Enterprise RAG Solution Accelerator [Jun 2023]
- Azure OpenAI chat baseline architecture in an Azure landing zone
- Azure Reference Architectures: X-ref
- RAG at scale [28 Sep 2023]
- LangChain RAG from scratch [Jan 2024]
- LlamIndex Building Performant RAG Applications for Production
- Advanced RAG on Hugging Face documentation using LangChain
- RAG context relevancy metric: Ragas, TruLens, DeepEval ref [Jun 2024]
Context Relevancy (in Ragas) = S / Total number of sentences in retrieved context
Contextual Relevancy (in DeepEval) = Number of Relevant Statements / Total Number of Statements
- Papers with code: RAG
- What AI Engineers Should Know about Search [25 Jun 2024]
-
GraphRAG (by Microsoft): Original Documents -> Knowledge Graph (Group Summaries) -> Partial Responses -> Final Response. local and global search.
ref git [24 Apr 2024]
- GraphRAG Implementation with LlamaIndex [15 Jul 2024]
- "From Local to Global" GraphRAG with Neo4j and LangChain [09 Jul 2024]
- Learn RAG with LangChain: Online book [May 2024]
- Advanced RAG Techniques:🏆Showcases various advanced techniques for Retrieval-Augmented Generation (RAG) [Jul 2024]
- A Practical Approach to Retrieval Augmented Generation (RAG) Systems: Online book [Dec 2023]
- Galileo eBook: 200 pages content. Mastering RAG. doc [Sep 2024]
-
Azure: Designing and developing a RAG solution
- RAG Application / Framework
- RAG capabilities of LlamaIndex to QA about SEC 10-K & 10-Q documents: A real world full-stack application using LlamaIndex [Sep 2023]
- RAGxplorer: Visualizing document chunks and the queries in the embedding space. [Jan 2024]
- PrivateGPT: 100% privately, no data leaks 1. The API is built using FastAPI and follows OpenAI's API scheme. 2. The RAG pipeline is based on LlamaIndex. [May 2023]
- Danswer: Ask Questions in natural language and get Answers backed by private sources: Slack, GitHub, Confluence, etc. [Apr 2023]
- Verba Retrieval Augmented Generation (RAG) chatbot powered by Weaviate git [Jul 2023]
- llm-answer-engine: Build a Perplexity-Inspired Answer Engine Using Next.js, Groq, Mixtral, LangChain, OpenAI, Brave & Serper [Mar 2024]
- turboseek: An AI search engine inspired by Perplexity [May 2024]
- quivr: A personal productivity assistant (RAG). Chat with your docs (PDF, CSV, ...) [May 2023]
- RAGApp: Agentic RAG. custom GPTs, but deployable in your own cloud infrastructure using Docker. [Apr 2024]
- Cognita: RAG (Retrieval Augmented Generation) Framework for building modular, open source applications [Jul 2023]
- Open Source AI Searches: Perplexica: Open source alternative to Perplexity AI [Apr 2024] / Marqo / txtai / Typesense / Morphic
- AutoRAG: RAG AutoML tool for automatically finds an optimal RAG pipeline for your data. [Jan 2024]
- RAGflow: Streamlined RAG workflow. Focusing on Deep document understanding [Dec 2023]
- MindSearch: an open-source AI Search Engine Framework [Jul 2024]
- RAGFoundry: a library designed to improve LLMs ability to use external information by fine-tuning models on specially created RAG-augmented datasets. [5 Aug 2024]
- Haystack: LLM orchestration framework to build customizable, production-ready LLM applications. [5 May 2020]
- RAGChecker: A Fine-grained Framework For Diagnosing RAG git [15 Aug 2024]
- HybridRAG: Integrating VectorRAG and GraphRAG with financial earnings call transcripts in Q&A format. [9 Aug 2024]
- MedGraphRAG: MedGraphRAG outperforms the previous SOTA model, Medprompt, by 1.1%. git [8 Aug 2024]
- STORM: Wikipedia-like articles from scratch based on Internet search. [Mar 2024]
- FlashRAG: A Python Toolkit for Efficient RAG Research [Mar 2024]
- Canopy: open-source RAG framework and context engine built on top of the Pinecone vector database. [Aug 2023]
- kotaemon: open-source clean & customizable RAG UI for chatting with your documents. [Mar 2024]
- PaperQA2: High accuracy RAG for answering questions from scientific documents with citations [Feb 2023]
- Applications, Frameworks, and User Interface (UI/UX): X-ref
-
LlamaIndex (formerly GPT Index) is a data framework for LLM applications to ingest, structure, and access private or domain-specific data. The high-level API allows users to ingest and query their data in a few lines of code. High-Level Concept: ref / doc:ref / blog:ref / git [Nov 2022]
Fun fact this core idea was the initial inspiration for GPT Index (the former name of LlamaIndex) 11/8/2022 - almost a year ago!. cite / Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
- Build a data structure (memory tree)
- Transverse it via LLM prompting
-
LlamaIndex Toolkits:
LlamaHub
: A library of data loaders for LLMs git [Feb 2023] /LlamaIndex CLI
: a command line tool to generate LlamaIndex apps ref [Nov 2023] /LlamaParse
: A unique parsing tool for intricate documents git [Feb 2024]High-Level Concepts
-
Query engine vs Chat engine
- The query engine wraps a
retriever
and aresponse synthesizer
into a pipeline, that will use the query string to fetch nodes (sentences or paragraphs) from the index and then send them to the LLM (Language and Logic Model) to generate a response - The chat engine is a quick and simple way to chat with the data in your index. It uses a
context manager
to keep track of the conversation history and generate relevant queries for the retriever. Conceptually, it is astateful
analogy of a Query Engine.
- The query engine wraps a
-
Storage Context vs Settings (p.k.a. Service Context)
- Both the Storage Context and Service Context are data classes.
- Introduced in v0.10.0, ServiceContext is replaced to Settings object.
- Storage Context is responsible for the storage and retrieval of data in Llama Index, while the Service Context helps in incorporating external context to enhance the search experience.
- The Service Context is not directly involved in the storage or retrieval of data, but it helps in providing a more context-aware and accurate search experience.
# The storage context container is a utility container for storing nodes, indices, and vectors. class StorageContext: docstore: BaseDocumentStore index_store: BaseIndexStore vector_store: VectorStore graph_store: GraphStore
# The service context container is a utility container for LlamaIndex index and query classes. class ServiceContext: llm_predictor: BaseLLMPredictor prompt_helper: PromptHelper embed_model: BaseEmbedding node_parser: NodeParser llama_logger: LlamaLogger callback_manager: CallbackManager
@dataclass class _Settings: # lazy initialization _llm: Optional[LLM] = None _embed_model: Optional[BaseEmbedding] = None _callback_manager: Optional[CallbackManager] = None _tokenizer: Optional[Callable[[str], List[Any]]] = None _node_parser: Optional[NodeParser] = None _prompt_helper: Optional[PromptHelper] = None _transformations: Optional[List[TransformComponent]] = None
-
-
LlamaIndex Overview (Japanese) [17 Jul 2023]
-
LlamaIndex Tutorial: A Complete LlamaIndex Guide [18 Oct 2023]
- Chat engine ReAct mode, FLARE Query engine
- Multimodal RAG Pipeline ref [Nov 2023]
- From Simple to Advanced RAG ref / ref [10 Oct 2023]
- Building and Productionizing RAG: doc: Optimizing RAG Systems 1. Table Stakes 2. Advanced Retrieval: Small-to-Big 3. Agents 4. Fine-Tuning 5. Evaluation [Nov 2023]
- A Cheat Sheet and Some Recipes For Building Advanced RAG RAG cheat sheet shared above was inspired by RAG survey paper. doc [Jan 2024]
- Fine-Tuning a Linear Adapter for Any Embedding Model: Fine-tuning the embeddings model requires you to reindex your documents. With this approach, you do not need to re-embed your documents. Simply transform the query instead. [7 Sep 2023]
- 4 RAG techniques implemented in llama_index / cite [20 Sep 2023] / git
Expand: 4 RAG techniques
-
SQL Router Query Engine: Query router that can reference your vector database or SQL database
-
Sub Question Query Engine: Break down the complex question into sub-questions
-
Recursive Retriever + Query Engine: Reference node relationships, rather than only finding a node (chunk) that is most relevant.
-
Self Correcting Query Engines: Use an LLM to evaluate its own output.
- Not All Vector Databases Are Made Equal: Printed version for "Medium" limits. doc [2 Oct 2021]
- Faiss: Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It is used as an alternative to a vector database in the development and library of algorithms for a vector database. It is developed by Facebook AI Research. git [Feb 2017]
- Milvus (A cloud-native vector database) Embedded git [Sep 2019]: Alternative option to replace PineCone and Redis Search in OSS. It offers support for multiple languages, addresses the limitations of RedisSearch, and provides cloud scalability and high reliability with Kubernetes.
- Pinecone: A fully managed cloud Vector Database. Commercial Product [Jan 2021]
- Weaviate: Store both vectors and data objects. [Jan 2021]
- Chroma: Open-source embedding database [Oct 2022]
- Qdrant: Written in Rust. Qdrant (read: quadrant) [May 2020]
- Redis extension for vector search, RedisVL: Redis Vector Library (RedisVL) [Nov 2022]
- A SQLite extension for efficient vector search, based on Faiss! [Jan 2023]
- pgvector: Open-source vector similarity search for Postgres [Apr 2021] / pgvectorscale: 75% cheaper than pinecone [Jul 2023]
- lancedb: LanceDB's core is written in Rust and is built using Lance, an open-source columnar format. [Feb 2023]
- A Comprehensive Survey on Vector Database: Categorizes search algorithms by their approach, such as hash-based, tree-based, graph-based, and quantization-based. [18 Oct 2023]
- Pgvector extension on Azure Cosmos DB for PostgreSQL: ref [13 Jun 2023]
- Vector Search in Azure Cosmos DB for MongoDB vCore [23 May 2023]
- Vector search - Azure AI Search: ref Rebranded from Azure Cognitive Search [Oct 2019] to Azure AI Search [Nov 2023]
- Azure Cache for Redis Enterprise: Enterprise Redis Vector Search Demo [22 May 2023 ]
- Azure SQL's support for natively storing and querying vectors [21 May 2024]
Note: Azure Cache for Redis Enterprise: Enterprise Sku series are not able to deploy by a template such as Bicep and ARM.
- Azure Open AI Embedding API,
text-embedding-ada-002
, supports 1536 dimensions. Elastic search, Lucene based engine, supports 1024 dimensions as a max. Open search can insert 16,000 dimensions as a vector storage. Open search is available to use as a vector database with Azure Open AI Embedding API. - OpenAI Embedding models:
text-embedding-3
X-ref >New embedding models
- text-embedding-ada-002: Smaller embedding size. The new embeddings have only 1536 dimensions, one-eighth the size of davinci-001 embeddings, making the new embeddings more cost effective in working with vector databases. [15 Dec 2022]
- However, one exception to this is that the maximum dimension count for the Lucene engine is 1,024, compared with 16,000 for the other engines. ref
- Vector Search with OpenAI Embeddings: Lucene Is All You Need: Our experiments were based on Lucene 9.5.0, but indexing was a bit tricky because the HNSW implementation in Lucene restricts vectors to 1024 dimensions, which was not sufficient for OpenAI’s 1536-dimensional embeddings. Although the resolution of this issue, which is to make vector dimensions configurable on a per codec basis, has been merged to the Lucene source trunk git, this feature has not been folded into a Lucene release (yet) as of early August 2023. [29 Aug 2023]
- Is Cosine-Similarity of Embeddings Really About Similarity?: In linear matrix factorization, the use of regularization can impact, and in some cases, render cosine similarities meaningless. Regularization involves two objectives. The first objective applies L2-norm regularization to the product of matrices A and B, a process similar to dropout. The second objective applies L2-norm regularization to each individual matrix, similar to the weight decay technique used in deep learning. [8 Mar 2024]
- Semantic Kernel (Feb 2023): An open-source SDK for integrating AI services like OpenAI, Azure OpenAI, and Hugging Face with conventional programming languages such as C# and Python. It's an LLM orchestrator, similar to LangChain. / git
- Kernel Memory (Jul 2023): An open-source service and plugin for efficient dataset indexing through custom continuous data hybrid pipelines.
- Azure ML Prompt Flow (Jun 2023): A visual designer for prompt crafting using Jinja as a prompt template language. / ref / git
- A Memory in Semantic Kernel vs Kernel Memory (FKA. Semantic Memory (SM)): Kernel Memory is designed to efficiently handle large datasets and extended conversations. Deploying the memory pipeline as a separate service can be beneficial when dealing with large documents or long bot conversations. ref
- Prompt Engine (Jun 2022): A tool for crafting prompts for large language models in Python. / Python
- PromptBench (Jun 2023): A unified evaluation framework for large language models.
- SAMMO (Apr 2024): A general-purpose framework for prompt optimization. / ref
- Prompty (Apr 2024): A template language for integrating prompts with LLMs and frameworks, enhancing prompt management and evaluation.
- guidance (Nov 2022): A domain-specific language (DSL) for controlling large language models, focusing on model interaction and implementing the "Chain of Thought" technique.
- LMOps (Dec 2022): A toolkit for improving text prompts used in generative AI models, including tools like Promptist for text-to-image generation and Structured Prompting.
- LLMLingua (Jul 2023): A tool for compressing prompts and KV-Cache, achieving up to 20x compression with minimal performance loss. LLMLingua-2 was released in Mar 2024.
- TypeChat (Apr 2023): A tool that replaces prompt engineering with schema engineering, designed to build natural language interfaces using types. / git
- JARVIS (Mar 2023): An interface for LLMs to connect numerous AI models for solving complex AI tasks.
- Autogen (Mar 2023): A customizable and conversable agent framework. / ref / Autogen Studio (June 2024)
- TaskWeaver (Sep 2023): A code-first agent framework for converting natural language requests into executable code with support for rich data structures and domain-adapted planning.
- UFO (Mar 2024): A UI-focused agent for Windows OS interaction.
- Semantic Workbench (Aug 2024): A development tool for creating intelligent agents. / ref
- DeepSpeed (May 2020): A deep learning optimization library for easy, efficient, and effective distributed training and inference, featuring the Zero Redundancy Optimizer.
- FLAML (Dec 2020): A lightweight Python library for efficient automation of machine learning and AI operations, offering interfaces for AutoGen, AutoML, and hyperparameter tuning.
- PyRIT (Dec 2023): Python Risk Identification Tool for generative AI, focusing on LLM robustness against issues like hallucination, bias, and harassment.
- AI Central (Oct 2023): An AI Control Center for monitoring, authenticating, and providing resilient access to multiple OpenAI services.
- Microsoft Fabric: Fabric integrates technologies like Azure Data Factory, Azure Synapse Analytics, and Power BI into a single unified product [May 2023]
-
Copilot Products
-
Microsoft Copilot in Windows
vsMicrosoft Copilot
(= Copilot in Windows + Commercial Data Protection) vsMicrosoft 365 Copilot
(= Microsoft Copilot + M365 Integration) [Nov 2023] - Copilot Scenario Library
- Azure
- Microsoft Copilot for Azure / blog [Nov 2023]
- Security Copilot / blog [March 2023]
- Copilot in Azure Quantum [June 2023]
- Microsoft 365 (Incl. Dynamics 365 and Power Platform)
- Microsoft 365 Copilot / blog [Nov 2023]
- Copilot in Power Platform: Power App AI Copilot [March 2023] / Power Automate: Copilot in cloud flows, Copilot in Process Mining ingestion, Copilot in Power Automate for desktop ... [Nov 2023]
- Dynamics 365 Copilot / blog [March 2023]
- Microsoft Viva Copilot blog [April 2023]
- Microsoft Fabric and Power BI: blog / Fabric Copilot / PowerBI Copilot [March 2024]
- Copilot Pro: Copilot Pro offers all the features of Copilot, plus faster responses, priority access to advanced models, personalized GPTs, integration with Microsoft 365 apps, and enhanced AI image creation. [Jan 2024]
- Team Copilot: Act as a valuable team member (Meeting facilitator, Group collaborator, Project manager) [May 2024]
- Copilot Pages: Copilot Pages is a dynamic, persistent canvas in Copilot chat designed for multiplayer AI collaboration [16 Sep 2024]
- Windows, Bing and so on
- Microsoft Copilot: FKA. Bing Chat Enterprise [Nov 2023]
- Microsoft Clarity Copilot: blog [March 2023]
- Microsoft Copilot in Windows [Sep 2023]
- Github Copilot [Oct 2021]
- Copilot+ PC: AI-powered and NPU-equipped Windows PCs [May 2024]
- Windows Copilot Runtime: The set of APIs powered by the 40+ on-device models, a new layer of Windows. [May 2024]
- Nuance DAX Copilot: AI assistant for automated clinical documentation [18 Jan 2024]
-
-
Customize Copilot
- Microsoft AI and AI Studio
- Microsoft AI
- The age of copilots: blog [Nov 2023]
- Azure AI Studio: Generative AI Developmet Hub + Promptflow + Azure AI Content safety / youtube / SDK and CLI
- Copilot Studio
- The Copilot System: Explained by Microsoft youtube [Mar 2023]
- Microsoft Copilot Studio: Customize Copilot for Microsoft 365. FKA. Power Virtual Agents: ref [Nov 2023]
- Microsoft Copilot Dashboard / blog
- Microsoft Office Copilot: Natural Language Commanding via Program Synthesis: [cnt]: Semantic Interpreter, a natural language-friendly AI system for productivity software such as Microsoft Office that leverages large language models (LLMs) to execute user intent across application features. [6 Jun 2023]
- NL2KQL: From Natural Language to Kusto Query [3 Apr 2024]
- SpreadsheetLLM: Introduces an efficient method to encode Excel sheets, outperforming previous approaches with 25 times fewer tokens.[12 Jul 2024]
- GraphRAG (by Microsoft): RAG with a graph-based approach to efficiently answer both specific and broad questions over large text corpora1. ref git [24 Apr 2024]
- AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent Systems [9 Aug 2024]
- Microsoft AI and AI Studio
Azure OpenAI Embeddings QnA [Apr 2023] | Azure Cosmos DB + OpenAI ChatGPT C# blazor [Mar 2023] |
C# Implementation ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search [Apr 2023] | Simple ChatGPT UI application Typescript, ReactJs and Flask [Apr 2023] |
Azure Video Indexer demo Azure Video Indexer + OpenAI [Apr 2023] | Miyagi Integration demonstrate for multiple langchain libraries [Feb 2023] |
ChatGPT + Enterprise data RAG (Retrieval-Augmented Generation)🏆 [Feb 2023] | Chat with your data - Solution accelerator [Jun 2023] |
-
Referece Use Case and Architecture
- AI Feed | AI Platform Blog
- Azure Command Companion: Harnessing the Power of OpenAI GPT-3.5 Turbo for Azure CLI Command Generation [10 Dec 2023 ]
- Chat with your Azure DevOps data [10 Jan 2024]
- Baseline OpenAI end-to-end chat reference architecture
- Build language model pipelines with memory
- NL to SQL Architecture Alternative [14 May 2024] / Natural Language to SQL Console
- GPT-RAG: Retrieval-Augmented Generation pattern running in Azure [Jun 2023]
- Responsible AI Transparency Report
- Safeguard and trustworthy generative AI applications [28 Mar 2024]
- Microsoft AI / Responsible AI 🏆
- Baseline Agentic AI Systems Architecture [20 Aug 2024]
- AI Agent-Driven Auto Insurance Claims RAG Pipeline [09 Sep 2024]
-
Azure OpenAI Accelerator / Application
- Azure-Cognitive-Search-Azure-OpenAI-Accelerator [May 2023]
- Conversational-Azure-OpenAI-Accelerator [Feb 2022]
- ChatGPT + Enterprise data RAG (Retrieval-Augmented Generation) Demo git 🏆 [8 Feb 2023]
- Azure OpenAI samples: ref [Apr 2023]
- The repository for all Azure OpenAI Samples complementing the OpenAI cookbook.: ref [Apr 2023]
- Azure-Samples ref
- Azure OpenAI with AKS By Terraform: git [Jun 2023]
- Azure OpenAI with AKS By Bicep: git [May 2023]
- Enterprise Logging: git [Feb 2023] / Setting up Azure OpenAI with Azure API Management [Jan 2024]
- Azure OpenAI with AKS by Terraform (simple version): git [May 2023]
- ChatGPT Plugin Quickstart using Python and FastAPI: git [May 2023]
- GPT-Azure-Search-Engine: git
Integration of Azure Bot Service with LangChain
[Feb 2023] - Azure OpenAI Network Latency Test Script : git [Jun 2023]
- Create an Azure OpenAI, LangChain, ChromaDB, and Chainlit ChatGPT-like application in Azure Container Apps using Terraform git [Jul 2023]
- Azure SQL DB + AOAI / Smart load balancing for AOAI / Azure Functions (C#) bindings for OpenAI / Microsoft Entra ID Authentication for AOAI / Azure OpenAI workshop / RAG for Azure Data / AI-Sentry: A lightweight, pluggable facade layer for AOAI
- Azure Open AI work with Cognitive Search act as a Long-term memory
- ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search [Feb 2023]
- Can ChatGPT work with your enterprise data? [06 Apr 2023]
- Azure OpenAI と Azure Cognitive Search の組み合わせを考える [24 May 2023]
- AI-in-a-Box: AI-in-a-Box aims to provide an "Azure AI/ML Easy Button" for common scenarios [Sep 2023]
- AI Samples for .NET: official .NET samples demonstrating how to use AI [Feb 2024]
- OpenAI Official .NET Library [Apr 2024]
- Smart Components: Experimental, end-to-end AI features for .NET apps [Mar 2024]
- Prompt Buddy: 🏆Share and upvote favorite AI prompts. free Microsoft Teams Power App using Dataverse for Teams. [Mar 2024]
- Azure Multimodal AI + LLM Processing Accelerator: Build multimodal data processing pipelines with Azure AI Services + LLMs [Aug 2024]
- ARGUS: Hybrid approach with Azure Document Intelligence combined and GPT4-Vision to get better results without any pre-training. [Jun 2024]
-
Guideline
- Grounding LLMs: Retrieval-Augmented Generation (RAG) [09 Jun 2023]
- Revolutionize your Enterprise Data with ChatGPT [09 Mar 2023]
- Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback [07 Mar 2023]
- Azure OpenAI Design Patterns: A set of design patterns using the Azure OpenAI service [May 2023]
- Azure AI Services Landing Zone / ref [24 Jul 2023]
- Security Best Practices for GenAI Applications (OpenAI) in Azure [16 Jan 2024]
- Authentication and Authorization in Generative AI applications with Entra ID and Azure AI Search [09 Jan 2024]
- Integrate private access to your Azure Open AI Chatbot [30 Nov 2023]
- Smart load balancing for OpenAI endpoints git [Jan 2024]
- An Introduction to LLMOps: Operationalizing and Managing Large Language Models using Azure ML [27 Aug 2023]
- Optimize Azure OpenAI Applications with Semantic Caching [09 Apr 2024]
- Azure OpenAI and Call Center Modernization [11 Apr2024]
- Azure OpenAI Best Practices Insights from Customer Journeys: LLMLingua, Skeleton Of Thought [12 Jun 2024]
- Retrieval Augmented Fine Tuning: RAFT: Combining the best parts of RAG and fine-tuning (SFT) [25 Sep 2024]
- Azure Cognitive Search rebranding Azure AI Search, it supports Vector search and semantic ranker. [16 Nov 2023]
- In the vector databases category within Azure, several alternative solutions are available. However, the only option that provides a range of choices, including a conventional Lucene-based search engine and a hybrid search incorporating vector search capabilities.
- Vector Search Sample Code: git [Apr 2023]
- Azure AI Search (FKA. Azure Cognitive Search) supports
- Text Search
- Pure Vector Search
- Hybrid Search (Text search + Vector search)
- Semantic Hybrid Search (Text search + Semantic search + Vector search)
- A set of capabilities designed to improve relevance in these scenarios. We use a combination of hybrid retrieval (vector search + keyword search) + semantic ranking as the most effective approach for improved relevance out-of–the-box.
TL;DR: Retrieval Performance; Hybrid search + Semantic rank > Hybrid search > Vector only search > Keyword only
ref [18 Sep 2023]
- Hybrid search using Reciprocal Rank Fusion (RRF): Reciprocal Rank Fusion (RRF) is an algorithm that evaluates the search scores from multiple, previously ranked results to produce a unified result set. In Azure Cognitive Search, RRF is used whenever there are two or more queries that execute in parallel. ref
- Integrated vectorization: Automatically splits documents into chunks, creates embeddings with Azure OpenAI, maps them to an Azure AI Search index, and automates query vectorization. [24 Aug 2024]
- Copilot (FKA. Bing Chat Enterprise) [18 Jul 2023] Privacy and Protection
- Doesn't have plugin support
- Only content provided in the chat by users is accessible to Bing Chat Enterprise.
- Azure OpenAI Service On Your Data in Public Preview ref [19 Jun 2023]
- Azure OpenAI Finetuning: Babbage-002 is $34/hour, Davinci-002 is $68/hour, and Turbo is $102/hour. ref [16 Oct 2023]
- Customer Copyright Commitment: protects customers from certain IP claims related to AI-generated content. ref [16 Nov 2023]
- Models as a Service (MaaS): A cloud-based AI approach that provides developers and businesses with access to pre-built, pre-trained machine learning models. [July 2023]
- Assistants API: Code Interpreter, Function calling, Knowledge retrieval tool, and Threads (Truncated and optimized conversation history for the model's context length) in Azure [06 Feb 2024]
- Microsoft LangChain Library supports C# and Python and offers several features, some of which are still in development and may be unclear on how to implement. However, it is simple, stable, and faster than Python-based open-source software. The features listed on the link include: Semantic Kernel Feature Matrix / doc:ref / blog:ref / git [Feb 2023]
- .NET Semantic Kernel SDK: 1. Renamed packages and classes that used the term “Skill” to now use “Plugin”. 2. OpenAI specific in Semantic Kernel core to be AI service agnostic 3. Consolidated our planner implementations into a single package ref [10 Oct 2023]
- Road to v1.0 for the Python Semantic Kernel SDK ref [23 Jan 2024] backlog
- Semantic Kernel sample application: Chat Copilot [Apr 2023] / Virtual Customer Success Manager (VCSM) [Jul 2024]
- Semantic Kernel Recipes: A collection of C# notebooks git [Mar 2023]
- Deploy Semantic Kernel with Bot Framework ref git [26 Oct 2023]
- Semantic Kernel-Powered OpenAI Plugin Development Lifecycle ref [30 Oct 2023]
- SemanticKernel Implementation sample to overcome Token limits of Open AI model. Semantic Kernel でトークンの限界を超えるような長い文章を分割してスキルに渡して結果を結合したい (zenn.dev) ref [06 May 2023]
- Learning Paths for Semantic Kernel [28 Mar 2024]
- A Pythonista’s Intro to Semantic Kernel [3 Sep 2023]
- Step-by-Step Guide to Building a Powerful AI Monitoring Dashboard with Semantic Kernel and Azure Monitor: Step-by-step guide to building an AI monitoring dashboard using Semantic Kernel and Azure Monitor to track token usage and custom metrics. [23 Aug 2024]
-
Semantic Kernel Planner ref [24 Jul 2023]
-
Is Semantic Kernel Planner the same as LangChain agents?
Planner in SK is not the same as Agents in LangChain. cite [11 May 2023]
Agents in LangChain use recursive calls to the LLM to decide the next step to take based on the current state. The two planner implementations in SK are not self-correcting. Sequential planner tries to produce all the steps at the very beginning, so it is unable to handle unexpected errors. Action planner only chooses one tool to satisfy the goal
-
Stepwise Planner released. The Stepwise Planner features the "CreateScratchPad" function, acting as a 'Scratch Pad' to aggregate goal-oriented steps. [16 Aug 2023]
-
Gen-4 and Gen-5 planners: 1. Gen-4: Generate multi-step plans with the Handlebars 2. Gen-5: Stepwise Planner supports Function Calling. ref [16 Nov 2023]
-
Use function calling for most tasks; it's more powerful and easier. Stepwise and Handlebars planners will be deprecated ref [Jun 2024]
-
The future of Planners in Semantic Kernel [23 July 2024]
-
Semantic Function - expressed in natural language in a text file "skprompt.txt" using SK's Prompt Template language. Each semantic function is defined by a unique prompt template file, developed using modern prompt engineering techniques. cite
-
Prompt Template language Key takeaways
1. Variables : use the {{$variableName}} syntax : Hello {{$name}}, welcome to Semantic Kernel!
2. Function calls: use the {{namespace.functionName}} syntax : The weather today is {{weather.getForecast}}.
3. Function parameters: {{namespace.functionName $varName}} and {{namespace.functionName "value"}} syntax
: The weather today in {{$city}} is {{weather.getForecast $city}}.
4. Prompts needing double curly braces :
{{ "{{" }} and {{ "}}" }} are special SK sequences.
5. Values that include quotes, and escaping :
For instance:
... {{ 'no need to \\"escape" ' }} ...
is equivalent to:
... {{ 'no need to "escape" ' }} ...
-
Glossary in Git / Glossary in MS Doc
Term Short Description ASK A user's goal is sent to SK as an ASK Kernel The kernel orchestrates a user's ASK Planner The planner breaks it down into steps based upon resources that are available Resources Planning involves leveraging available skills, memories, and connectors Steps A plan is a series of steps for the kernel to execute Pipeline Executing the steps results in fulfilling the user's ASK -
Architecting AI Apps with Semantic Kernel How you could recreate Microsoft Word Copilot [6 Mar 2024]
-
DSPy (Declarative Self-improving Language Programs, pronounced “dee-es-pie”) / doc:ref / git
-
DSPy Documentation & Cheetsheet ref
-
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines [5 Oct 2023] / git
-
DSPy Explained! youtube [30 Jan 2024]
-
DSPy RAG example in weviate recipes:
recipes > integrations
git -
Prompt Like a Data Scientist: Auto Prompt Optimization and Testing with DSPy [6 May 2024]
-
Instead of a hard-coded prompt template, "Modular approach: compositions of modules -> compile". Building blocks such as ChainOfThought or Retrieve and compiling the program, optimizing the prompts based on specific metrics. Unifying strategies for both prompting and fine-tuning in one tool, Pythonic operations, prioritizing and tracing program execution. These features distinguish it from other LMP frameworks such as LangChain, and LlamaIndex. ref [Jan 2023]
-
Automatically iterate until the best result is achieved: 1. Collect Data -> 2. Write DSPy Program -> 3. Define validtion logic -> 4. Compile DSPy program
-
DSPy vs. LangChain, LlamaIndex: LangChain and LlamaIndex offer pre-built modules for specific applications. DSPy provides general-purpose modules that learn to optimize your language model based on your data and pipeline. It's like the difference between PyTorch (DSPy) and HuggingFace Transformers (higher-level libraries).
- Glossary reference to the ref.
- Signatures: Hand-written prompts and fine-tuning are abstracted and replaced by signatures.
"question -> answer"
"long-document -> summary"
"context, question -> answer" - Modules: Prompting techniques, such as
Chain of Thought
orReAct
, are abstracted and replaced by modules.# pass a signature to ChainOfThought module generate_answer = dspy.ChainOfThought("context, question -> answer")
- Optimizers (formerly Teleprompters): Manual iterations of prompt engineering is automated with optimizers (teleprompters) and a DSPy Compiler.
# Self-generate complete demonstrations. Teacher-student paradigm, `BootstrapFewShotWithOptuna`, `BootstrapFewShotWithRandomSearch` etc. which work on the same principle. optimizer = BootstrapFewShot(metric=dspy.evaluate.answer_exact_match)
- DSPy Compiler: Internally trace your program and then optimize it using an optimizer (teleprompter) to maximize a given metric (e.g., improve quality or cost) for your task.
- e.g., the DSPy compiler optimizes the initial prompt and thus eliminates the need for manual prompt tuning.
cot_compiled = teleprompter.compile(CoT(), trainset=trainset, valset=devset) cot_compiled.save('turbo_gsm8k.json')
- Signatures: Hand-written prompts and fine-tuning are abstracted and replaced by signatures.
Expand
-
Automatic Few-Shot Learning
-
As a rule of thumb, if you don't know where to start, use
BootstrapFewShotWithRandomSearch
. -
If you have very little data, e.g. 10 examples of your task, use
BootstrapFewShot
. -
If you have slightly more data, e.g. 50 examples of your task, use
BootstrapFewShotWithRandomSearch
. -
If you have more data than that, e.g. 300 examples or more, use
BayesianSignatureOptimizer
.
-
-
Automatic Instruction Optimization
-
COPRO
: Repeat for a set number of iterations, tracking the best-performing instructions. -
MIPRO
: Repeat for a set number of iterations, tracking the best-performing combinations (instructions and examples).
-
-
Automatic Finetuning
- If you have been able to use one of these with a large LM (e.g., 7B parameters or above) and need a very efficient program, compile that down to a small LM with
BootstrapFinetune
.
- If you have been able to use one of these with a large LM (e.g., 7B parameters or above) and need a very efficient program, compile that down to a small LM with
-
LangChain is a framework for developing applications powered by language models. (1) Be data-aware: connect a language model to other sources of data. (2) Be agentic: Allow a language model to interact with its environment. doc:ref / blog:ref / git
-
It highlights two main value props of the framework:
- Components: modular abstractions and implementations for working with language models, with easy-to-use features.
- Use-Case Specific Chains: chains of components that assemble in different ways to achieve specific use cases, with customizable interfaces.cite: ref
-
LangChain 0.2: full separation of langchain and langchain-community. ref [May 2024]
-
Towards LangChain 0.1 ref [Dec 2023]
-
Basic LangChain building blocks ref [2023]
''' LLMChain: A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. ''' chain = prompt | model | parser
- Feature Matrix: LangChain Features
- Awesome LangChain: Curated list of tools and projects using LangChain.
- Cheetsheet: LangChain CheatSheet
- LangChain Cheetsheet KD-nuggets: LangChain Cheetsheet KD-nuggets doc [Aug 2023]
- LangChain AI Handbook: published by Pinecone
- LangChain Tutorial: A Complete LangChain Guide
- RAG From Scratch [Feb 2024]
- DeepLearning.AI short course: LangChain for LLM Application Development ref / LangChain: Chat with Your Data ref
- LangChain/cache: Reducing the number of API calls
- LangChain/context-aware-splitting: Splits a file into chunks while keeping metadata
- LangChain Expression Language: A declarative way to easily compose chains together [Aug 2023]
- LangSmith Platform for debugging, testing, evaluating. [Jul 2023]
-
LangChain Template: LangChain Reference architectures and samples. e.g.,
RAG Conversation Template
[Oct 2023] - OpenGPTs: An open source effort to create a similar experience to OpenAI's GPTs [Nov 2023]
- LangGraph: Build and navigate language agents as graphs ref [Aug 2023]
- langflow: LangFlow is a UI for LangChain, designed with react-flow. [Feb 2023]
- Flowise Drag & drop UI to build your customized LLM flow [Apr 2023]
- Chains ref
- SimpleSequentialChain: A sequence of steps with single input and output. Output of one step is input for the next.
- SequentialChain: Like SimpleSequentialChain but handles multiple inputs and outputs at each step.
- MultiPromptChain: Routes inputs to specialized sub-chains based on content. Ideal for different prompts for different tasks.
- Summarizer
- stuff: Sends everything at once in LLM. If it's too long, an error will occur.
- map_reduce: Summarizes by dividing and then summarizing the entire summary.
- refine: (Summary + Next document) => Summary
- map_rerank: Ranks by score and summarizes to important points.
- If you're using a text LLM, first try
zero-shot-react-description
. - If you're using a Chat Model, try
chat-zero-shot-react-description
. - If you're using a Chat Model and want to use memory, try
conversational-react-description
. -
self-ask-with-search
: Measuring and Narrowing the Compositionality Gap in Language Models [7 Oct 2022] -
react-docstore
: ReAct: Synergizing Reasoning and Acting in Language Models [6 Oct 2022] - Agent Type
class AgentType(str, Enum):
"""Enumerator with the Agent types."""
ZERO_SHOT_REACT_DESCRIPTION = "zero-shot-react-description"
REACT_DOCSTORE = "react-docstore"
SELF_ASK_WITH_SEARCH = "self-ask-with-search"
CONVERSATIONAL_REACT_DESCRIPTION = "conversational-react-description"
CHAT_ZERO_SHOT_REACT_DESCRIPTION = "chat-zero-shot-react-description"
CHAT_CONVERSATIONAL_REACT_DESCRIPTION = "chat-conversational-react-description"
STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION = (
"structured-chat-zero-shot-react-description"
)
OPENAI_FUNCTIONS = "openai-functions"
OPENAI_MULTI_FUNCTIONS = "openai-multi-functions"
-
ReAct is inspired by the synergies between "acting" and "reasoning" which allow humans to learn new tasks and make decisions or reasoning.
MRKL stands for Modular Reasoning, Knowledge and Language and is a neuro-symbolic architecture that combines large language models, external knowledge sources, and discrete reasoning
cite: ref [28 Apr 2023]
zero-shot-react-description
: Uses ReAct to select tools based on their descriptions. Any number of tools can be used, each requiring a description.
react-docstore
: Uses ReAct to manage a docstore with two required tools: Search and Lookup. These tools must be named exactly as specified. It follows the original ReAct paper's example from Wikipedia.- MRKL in LangChain uses
zero-shot-react-description
, implementing ReAct. The original ReAct framework is used in thereact-docstore
agent. MRKL was published on May 1, 2022, earlier than ReAct on October 6, 2022.
- MRKL in LangChain uses
-
ConversationBufferMemory
: Stores the entire conversation history. -
ConversationBufferWindowMemory
: Stores recent messages from the conversation history. -
Entity Memory
: Stores and retrieves entity-related information. -
Conversation Knowledge Graph Memory
: Stores entities and relationships between entities. -
ConversationSummaryMemory
: Stores summarized information about the conversation. -
ConversationSummaryBufferMemory
: Stores summarized information about the conversation with a token limit. -
ConversationTokenBufferMemory
: Stores tokens from the conversation. -
VectorStore-Backed Memory
: Leverages vector space models for storing and retrieving information.
- The Problem With LangChain: ref / git [14 Jul 2023]
- What’s your biggest complaint about langchain?: ref [May 2023]
- LangChain Is Pointless: ref [Jul 2023]
LangChain has been criticized for making simple things relatively complex, which creates unnecessary complexity and tribalism that hurts the up-and-coming AI ecosystem as a whole. The documentation is also criticized for being bad and unhelpful.
-
How to Build Ridiculously Complex LLM Pipelines with LangGraph! [17 Sep 2024 ]
LangChain does too much, and as a consequence, it does many things badly. Scaling beyond the basic use cases with LangChain is a challenge that is often better served with building things from scratch by using the underlying APIs.
- LangChain [Oct 2022] | LlamaIndex [Nov 2022] | Microsoft Semantic Kernel [Feb 2023] | Microsoft guidance [Nov 2022] | Azure ML Promt flow [Jun 2023] | DSPy [Jan 2023]
- Prompting Framework (PF): Prompting Frameworks for Large Language Models: A Survey git
- What Are Tools Anyway?: 1. For a small number (e.g., 5–10) of tools, LMs can directly select from contexts. However, with a larger number (e.g., hundreds), an additional retrieval step involving a retriever model is often necessary. 2. LM-used tools incl. Tool creation and reuse. Tool is not useful when machine translation, summarization, and sentiment analysis (among others). 3. Evaluation metrics [18 Mar 2024]
-
Basically LlamaIndex is a smart storage mechanism, while LangChain is a tool to bring multiple tools together. cite [14 Apr 2023]
-
LangChain offers many features and focuses on using chains and agents to connect with external APIs. In contrast, LlamaIndex is more specialized and excels at indexing data and retrieving documents.
LangChain | Semantic Kernel |
---|---|
Memory | Memory |
Tookit | Plugin (pre. Skill) |
Tool | LLM prompts (semantic functions) or native C# or Python code (native function) |
Agent | Planner |
Chain | Steps, Pipeline |
Tool | Connector |
-
What's the difference between LangChain and Semantic Kernel?
LangChain has many agents, tools, plugins etc. out of the box. More over, LangChain has 10x more popularity, so has about 10x more developer activity to improve it. On other hand, Semantic Kernel architecture and quality is better, that's quite promising for Semantic Kernel. ref [11 May 2023]
-
What's the difference between Azure Machine Learing PromptFlow and Semantic Kernel?
- Low/No Code vs C#, Python, Java
- Focused on Prompt orchestrating vs Integrate LLM into their existing app.
-
Promptflow is not intended to replace chat conversation flow. Instead, it’s an optimized solution for integrating Search and Open Source Language Models. By default, it supports Python, LLM, and the Prompt tool as its fundamental building blocks.
-
Using Prompt flow with Semantic Kernel: ref [07 Sep 2023]
Handlebars.js | Jinja2 | Prompt Template | |
---|---|---|---|
Conditions | {{#if user}} Hello {{user}}! {{else}} Hello Stranger! {{/if}} |
{% if user %} Hello {{ user }}! {% else %} Hello Stranger! {% endif %} |
Branching features such as "if", "for", and code blocks are not part of SK's template language. |
Loop | {{#each items}} Hello {{this}} {{/each}} |
{% for item in items %} Hello {{ item }} {% endfor %} |
By using a simple language, the kernel can also avoid complex parsing and external dependencies. |
LangChain Library | guidance. LangChain.js | LangChain, Azure ML prompt flow | Semantic Kernel |
URL | ref | ref | ref |
- Semantic Kernel supports HandleBars and Jinja2. [Mar 2024]
-
Zero-shot
- Large Language Models are Zero-Shot Reasoners: [cnt]: Let’s think step by step. [24 May 2022]
-
Few-shot Learning
- Open AI: Language Models are Few-Shot Learners: [cnt] [28 May 2020]
-
Chain of Thought (CoT): Chain-of-Thought Prompting Elicits Reasoning in Large Language Models [cnt]: ReAct and Self Consistency also inherit the CoT concept. [28 Jan 2022]
- Family of CoT:
Self-Consistency (CoT-SC)
>Tree of Thought (ToT)
>Graph of Thoughts (GoT)
>Iteration of Thought (IoT)
[19 Sep 2024],Diagram of Thought (DoT)
[16 Sep 2024] /To CoT or not to CoT?
: Meta-analysis of 100+ papers shows CoT significantly improves performance in math and logic tasks. [18 Sep 2024]
- Family of CoT:
-
Self-Consistency (CoT-SC): The three steps in the self-consistency method: 1) prompt the language model using CoT prompting, 2) sample a diverse set of reasoning paths from the language model, and 3) marginalize out reasoning paths to aggregate final answers and choose the most consistent answer. [21 Mar 2022]
-
Recursively Criticizes and Improves (RCI): [cnt] [30 Mar 2023]
- Critique: Review your previous answer and find problems with your answer.
- Improve: Based on the problems you found, improve your answer.
-
ReAct: [cnt]: Grounding with external sources. (Reasoning and Act): Combines reasoning and acting ref [6 Oct 2022]
-
Tree of Thought (ToT): [cnt]: Self-evaluate the progress intermediate thoughts make towards solving a problem [17 May 2023] git / Agora: Tree of Thoughts (ToT) git
-
tree-of-thought\forest_of_thought.py
: Forest of thought Decorator sample -
tree-of-thought\tree_of_thought.py
: Tree of thought Decorator sample -
tree-of-thought\react-prompt.py
: ReAct sample without LangChain
-
-
Graph of Thoughts (GoT): [cnt] Solving Elaborate Problems with Large Language Models git [18 Aug 2023]
-
Retrieval Augmented Generation (RAG): [cnt]: To address such knowledge-intensive tasks. RAG combines an information retrieval component with a text generator model. [22 May 2020]
-
Zero-shot, one-shot and few-shot cite [28 May 2020]
-
Prompt Engneering overview cite [10 Jul 2023]
-
Prompt Concept
- Question-Answering
- Roll-play:
Act as a [ROLE] perform [TASK] in [FORMAT]
- Reasoning
- Prompt-Chain
-
-
Chain-of-Verification reduces Hallucination in LLMs: [cnt]: A four-step process that consists of generating a baseline response, planning verification questions, executing verification questions, and generating a final verified response based on the verification results. [20 Sep 2023]
-
Plan-and-Solve Prompting: Develop a plan, and then execute each step in that plan. [6 May 2023]
-
Reflexion: [cnt]: Language Agents with Verbal Reinforcement Learning. 1. Reflexion that uses
verbal reinforcement
to help agents learn from prior failings. 2. Reflexion converts binary or scalar feedback from the environment into verbal feedback in the form of a textual summary, which is then added as additional context for the LLM agent in the next episode. 3. It is lightweight and doesn’t require finetuning the LLM. [20 Mar 2023] / git -
Large Language Models as Optimizers: [cnt]:
Take a deep breath and work on this problem step-by-step.
to improve its accuracy. Optimization by PROmpting (OPRO) [7 Sep 2023] -
Promptist
-
Promptist: Microsoft's researchers trained an additional language model (LM) that optimizes text prompts for text-to-image generation.
- For example, instead of simply passing "Cats dancing in a space club" as a prompt, an engineered prompt might be "Cats dancing in a space club, digital painting, artstation, concept art, soft light, hdri, smooth, sharp focus, illustration, fantasy."
-
Promptist: Microsoft's researchers trained an additional language model (LM) that optimizes text prompts for text-to-image generation.
-
Power of Prompting
- GPT-4 with Medprompt: GPT-4, using a method called Medprompt that combines several prompting strategies, has surpassed MedPaLM 2 on the MedQA dataset without the need for fine-tuning. ref [28 Nov 2023]
- promptbase: Scripts demonstrating the Medprompt methodology [Dec 2023]
-
Adversarial Prompting
- Prompt Injection:
Ignore the above directions and ...
- Prompt Leaking:
Ignore the above instructions ... followed by a copy of the full prompt with exemplars:
- Jailbreaking: Bypassing a safety policy, instruct Unethical instructions if the request is contextualized in a clever way. ref
- Prompt Injection:
-
Prompt Principle for Instructions: 26 prompt principles: e.g.,
1) No need to be polite with LLM so there .. 16) Assign a role.. 17) Use Delimiters..
[26 Dec 2023] -
ChatGPT : “user”, “assistant”, and “system” messages.**
To be specific, the ChatGPT API allows for differentiation between “user”, “assistant”, and “system” messages.
- always obey "system" messages.
- all end user input in the “user” messages.
- "assistant" messages as previous chat responses from the assistant.
Presumably, the model is trained to treat the user messages as human messages, system messages as some system level configuration, and assistant messages as previous chat responses from the assistant. ref [2 Mar 2023]
-
Many-Shot In-Context Learning: Transitioning from few-shot to many-shot In-Context Learning (ICL) can lead to significant performance gains across a wide variety of generative and discriminative tasks [17 Apr 2024]
-
Skeleton Of Thought: Skeleton-of-Thought (SoT) reduces generation latency by first creating an answer's skeleton, then filling each skeleton point in parallel via API calls or batched decoding. [28 Jul 2023]
-
NLEP (Natural Language Embedded Programs) for Hybrid Language Symbolic Reasoning: Use code as a scaffold for reasoning. NLEP achieves over 90% accuracy when prompting GPT-4. [19 Sep 2023]
-
A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications: a summary detailing the prompting methodology, its applications.🏆Taxonomy of prompt engineering techniques in LLMs. [5 Feb 2024]
-
Is the new norm for NLP papers "prompt engineering" papers?: "how can we make LLM 1 do this without training?" Is this the new norm? The CL section of arXiv is overwhelming with papers like "how come LLaMA can't understand numbers?" [2 Aug 2024]
-
Re-Reading Improves Reasoning in Large Language Models: RE2 (Re-Reading), which involves re-reading the question as input to enhance the LLM's understanding of the problem.
Read the question again
[12 Sep 2023]
-
Expand
-
FireAct: [cnt]: Toward Language Agent Fine-tuning. 1. This work takes an initial step to show multiple advantages of fine-tuning LMs for agentic uses. 2. Duringfine-tuning, The successful trajectories are then converted into the ReAct format to fine-tune a smaller LM. 3. This work is an initial step toward language agent fine-tuning, and is constrained to a single type of task (QA) and a single tool (Google search). / git [9 Oct 20239]
-
RankPrompt: Self-ranking method. Direct Scoring independently assigns scores to each candidate, whereas RankPrompt ranks candidates through a systematic, step-by-step comparative evaluation. [19 Mar 2024]
-
Language Models as Compilers: With extensive experiments on seven algorithmic reasoning tasks, Think-and-Execute is effective. It enhances large language models’ reasoning by using task-level logic and pseudocode, outperforming instance-specific methods. [20 Mar 2023]
-
-
Automatic Prompt Engineer (APE): Automatically optimizing prompts. APE has discovered zero-shot Chain-of-Thought (CoT) prompts superior to human-designed prompts like “Let’s think through this step-by-step” (Kojima et al., 2022). The prompt “To get the correct answer, let’s think step-by-step.” triggers a chain of thought. Two approaches to generate high-quality candidates: forward mode and reverse mode generation. [3 Nov 2022] git / ref [Mar 2024]
-
Claude Prompt Engineer: Simply input a description of your task and some test cases, and the system will generate, test, and rank a multitude of prompts to find the ones that perform the best. [4 Jul 2023] / Anthropic Helper metaprompt ref / Claude Sonnet 3.5 for Coding
-
Cohere’s new Prompt Tuner: Automatically improve your prompts [31 Jul 2024]
- Prompt Engineering: Prompt Engineering, also known as In-Context Prompting ... [Mar 2023]
- Prompt Engineering Guide: 🏆Copyright © 2023 DAIR.AI
- Azure OpenAI Prompt engineering techniques
- OpenAI Prompt example
- OpenAI Best practices for prompt engineering
- Awesome ChatGPT Prompts [Dec 2022]
- Awesome Prompt Engineering [Feb 2023]
- Awesome-GPTs-Prompts [Jan 2024]
- Prompts for Education: Microsoft Prompts for Education [Jul 2023]
- DeepLearning.ai ChatGPT Prompt Engineering for Developers
- Leaked prompts of GPTs [Nov 2023] and Agents [Nov 2023]
- LLM Prompt Engineering Simplified [Feb 2024]
- Power Platform GPT Prompts [Mar 2024]
- Fabric: A modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere [Jan 2024]
- Anthropic Prompt Library: Anthropic released a Claude 3 AI prompt library [Mar 2024]
- Copilot prompts: Examples of prompts for Microsoft Copilot. [25 Apr 2024]
- In-The-Wild Jailbreak Prompts on LLMs: A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts). Collected from December 2022 to December 2023 [Aug 2023]
- LangChainHub: a collection of all artifacts useful for working with LangChain primitives such as prompts, chains and agents. [Jan 2023]
- Anthropic courses > Prompt engineering interactive tutorial: a comprehensive step-by-step guide to key prompting techniques / prompt evaluations [Aug 2024]
LLM Pre-training and Post-training Paradigms X-ref
PEFT: Parameter-Efficient Fine-Tuning (Youtube) [24 Apr 2023]
-
PEFT: Parameter-Efficient Fine-Tuning. PEFT is an approach to fine tuning only a few parameters. [10 Feb 2023]
-
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning: [cnt] [28 Mar 2023]
-
Category: Represent approach - Description - Pseudo Code ref [22 Sep 2023]
-
Adapters: Adapters - Additional Layers. Inference can be slower.
def transformer_with_adapter(x): residual = x x = SelfAttention(x) x = FFN(x) # adapter x = LN(x + residual) residual = x x = FFN(x) # transformer FFN x = FFN(x) # adapter x = LN(x + residual) return x
-
Soft Prompts: Prompt-Tuning - Learnable text prompts. Not always desired results.
def soft_prompted_model(input_ids): x = Embed(input_ids) soft_prompt_embedding = SoftPromptEmbed(task_based_soft_prompt) x = concat([soft_prompt_embedding, x], dim=seq) return model(x)
-
Selective: BitFit - Update only the bias parameters. fast but limited.
params = (p for n,p in model.named_parameters() if "bias" in n) optimizer = Optimizer(params)
-
Reparametrization: LoRa - Low-rank decomposition. Efficient, Complex to implement.
def lora_linear(x): h = x @ W # regular linear h += x @ W_A @ W_B # low_rank update return scale * h
-
-
LoRA: Low-Rank Adaptation of Large Language Models: [cnt]: LoRA is one of PEFT technique. To represent the weight updates with two smaller matrices (called update matrices) through low-rank decomposition. git [17 Jun 2021]
Expand: LoRA Family
- LoRA+: Improves LoRA’s performance and fine-tuning speed by setting different learning rates for the LoRA adapter matrices. [19 Feb 2024]
- LoTR: Tensor decomposition for gradient update. [2 Feb 2024]
- The Expressive Power of Low-Rank Adaptation: Theoretically analyzes the expressive power of LoRA. [26 Oct 2023]
- DoRA: Weight-Decomposed Low-Rank Adaptation. Decomposes pre-trained weight into two components, magnitude and direction, for fine-tuning. [14 Feb 2024]
- LoRA Family ref [11 Mar 2024]
-
LoRA
introduces low-rank matrices A and B that are trained, while the pre-trained weight matrix W is frozen. -
LoRA+
suggests having a much higher learning rate for B than for A. -
VeRA
does not train A and B, but initializes them randomly and trains new vectors d and b on top. -
LoRA-FA
only trains matrix B. -
LoRA-drop
uses the output of B*A to determine, which layers are worth to be trained at all. -
AdaLoRA
adapts the ranks of A and B in different layers dynamically, allowing for a higher rank in these layers, where more contribution to the model’s performance is expected. -
DoRA
splits the LoRA adapter into two components of magnitude and direction and allows to train them more independently. -
Delta-LoRA
changes the weights of W by the gradient of A*B.
-
- 5 Techniques of LoRA ref: LoRA, LoRA-FA, VeRA, Delta-LoRA, LoRA+ [May 2024]
-
LoRA learns less and forgets less: Compared to full training, LoRA has less learning but better retention of original knowledge. [15 May 2024]
-
Practical Tips for Finetuning LLMs Using LoRA (Low-Rank Adaptation) [19 Nov 2023]: Best practical guide of LoRA.
- QLoRA saves 33% memory but increases runtime by 39%, useful if GPU memory is a constraint.
- Optimizer choice for LLM finetuning isn’t crucial. Adam optimizer’s memory-intensity doesn’t significantly impact LLM’s peak memory.
- Apply LoRA across all layers for maximum performance.
- Adjusting the LoRA rank is essential.
- Multi-epoch training on static datasets may lead to overfitting and deteriorate results.
-
QLoRA: Efficient Finetuning of Quantized LLMs: [cnt]: 4-bit quantized pre-trained language model into Low Rank Adapters (LoRA). git [23 May 2023]
-
Training language models to follow instructions with human feedback: [cnt] [4 Mar 2022]
-
Fine-tuning a GPT - LoRA: Comprehensive guide for LoRA doc [20 Jun 2023]
-
LIMA: Less Is More for Alignment: [cnt]: fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. LIMA demonstrates remarkably strong performance, either equivalent or strictly preferred to GPT-4 in 43% of cases. [18 May 2023]
-
Efficient Streaming Language Models with Attention Sinks: [cnt] 1. StreamingLLM, an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence length without any fine-tuning. 2. We neither expand the LLMs' context window nor enhance their long-term memory. git [29 Sep 2023]
Expand: StreamingLLM
- Key-Value (KV) cache is an important component in the StreamingLLM framework.
- Window Attention: Only the most recent Key and Value states (KVs) are cached. This approach fails when the text length surpasses the cache size.
- Sliding Attention /w Re-computation: Rebuilds the Key-Value (KV) states from the recent tokens for each new token. Evicts the oldest part of the cache.
- StreamingLLM: One of the techniques used is to add a placeholder token (yellow-colored) as a dedicated attention sink during pre-training. This attention sink attracts the model’s attention and helps it generalize to longer sequences. Outperforms the sliding window with re-computation baseline by up to a remarkable 22.2× speedup.
-
How to continue pretraining an LLM on new data:
Continued pretraining
can be as effective asretraining on combined datasets
. [13 Mar 2024]Expand: Continued pretraining
Three training methods were compared:
- Regular pretraining: A model is initialized with random weights and pretrained on dataset D1.
- Continued pretraining: The pretrained model from 1) is further pretrained on dataset D2.
- Retraining on combined dataset: A model is initialized with random weights and trained on the combined datasets D1 and D2.
Continued pretraining can be as effective as retraining on combined datasets. Key strategies for successful continued pretraining include:
- Re-warming: Increasing the learning rate at the start of continued pre-training.
- Re-decaying: Gradually reducing the learning rate afterwards.
- Data Mixing: Adding a small portion (e.g., 5%) of the original pretraining data (D1) to the new dataset (D2) to prevent catastrophic forgetting.
-
Expand: LongLoRA
- LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models: [cnt]: A combination of sparse local attention and LoRA git [21 Sep 2023]
- Key Takeaways from LongLora
- The document states that LoRA alone is not sufficient for long context extension.
- Although dense global attention is needed during inference, fine-tuning the model can be done by sparse local attention, shift short attention (S2-Attn).
- S2-Attn can be implemented with only two lines of code in training.
-
A key difference between Llama 1: [cnt] [27 Feb 2023] and Llama 2: [cnt] [18 Jul 2023] is the architectural change of attention layer, in which Llama 2 takes advantage of Grouped Query Attention (GQA) mechanism to improve efficiency. x-ref
-
Multi-query attention (MQA): [cnt] [22 May 2023]
-
Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm Youtube / git [03 Sep 2023]
Expand: KV Cache, Grouped Query Attention, Rotary PE
Rotary PE
def apply_rotary_embeddings(x: torch.Tensor, freqs_complex: torch.Tensor, device: str): # Separate the last dimension pairs of two values, representing the real and imaginary parts of the complex number # Two consecutive values will become a single complex number # (B, Seq_Len, H, Head_Dim) -> (B, Seq_Len, H, Head_Dim/2) x_complex = torch.view_as_complex(x.float().reshape(*x.shape[:-1], -1, 2)) # Reshape the freqs_complex tensor to match the shape of the x_complex tensor. So we need to add the batch dimension and the head dimension # (Seq_Len, Head_Dim/2) --> (1, Seq_Len, 1, Head_Dim/2) freqs_complex = freqs_complex.unsqueeze(0).unsqueeze(2) # Multiply each complex number in the x_complex tensor by the corresponding complex number in the freqs_complex tensor # Which results in the rotation of the complex number as shown in the Figure 1 of the paper # (B, Seq_Len, H, Head_Dim/2) * (1, Seq_Len, 1, Head_Dim/2) = (B, Seq_Len, H, Head_Dim/2) x_rotated = x_complex * freqs_complex # Convert the complex number back to the real number # (B, Seq_Len, H, Head_Dim/2) -> (B, Seq_Len, H, Head_Dim/2, 2) x_out = torch.view_as_real(x_rotated) # (B, Seq_Len, H, Head_Dim/2, 2) -> (B, Seq_Len, H, Head_Dim) x_out = x_out.reshape(*x.shape) return x_out.type_as(x).to(device)
KV Cache, Grouped Query Attention
# Replace the entry in the cache self.cache_k[:batch_size, start_pos : start_pos + seq_len] = xk self.cache_v[:batch_size, start_pos : start_pos + seq_len] = xv # (B, Seq_Len_KV, H_KV, Head_Dim) keys = self.cache_k[:batch_size, : start_pos + seq_len] # (B, Seq_Len_KV, H_KV, Head_Dim) values = self.cache_v[:batch_size, : start_pos + seq_len] # Since every group of Q shares the same K and V heads, just repeat the K and V heads for every Q in the same group. # (B, Seq_Len_KV, H_KV, Head_Dim) --> (B, Seq_Len_KV, H_Q, Head_Dim) keys = repeat_kv(keys, self.n_rep) # (B, Seq_Len_KV, H_KV, Head_Dim) --> (B, Seq_Len_KV, H_Q, Head_Dim) values = repeat_kv(values, self.n_rep)
-
Comprehensive Guide for LLaMA with RLHF: StackLLaMA: A hands-on guide to train LLaMA with RLHF [5 Apr 2023]
-
Official LLama Recipes incl. Finetuning: git
-
Llama 2 ONNX git [Jul 2023]
- ONNX, or Open Neural Network Exchange, is an open standard for machine learning interoperability. It allows AI developers to use models across various frameworks, tools, runtimes, and compilers.
- Machine learning technique that trains a "reward model" directly from human feedback and uses the model as a reward function to optimize an agent's policy using reinforcement learning.
-
InstructGPT: Training language models to follow instructions with human feedback: [cnt] is a model trained by OpenAI to follow instructions using human feedback. [4 Mar 2022]
cite - Libraries: TRL, trlX, Argilla
TRL: from the Supervised Fine-tuning step (SFT), Reward Modeling step (RM) to the Proximal Policy Optimization (PPO) step
The three steps in the process: 1. pre-training on large web-scale data, 2. supervised fine-tuning on instruction data (instruction tuning), and 3. RLHF. ref [ⓒ 2023] -
Supervised Fine-Tuning (SFT)
fine-tuning a pre-trained model on a specific task or domain using labeled data. This can cause more significant shifts in the model’s behavior compared to RLHF.
-
Reinforcement Learning from Human Feedback (RLHF)) is a process of pretraining and retraining a language model using human feedback to develop a scoring algorithm that can be reapplied at scale for future training and refinement. As the algorithm is refined to match the human-provided grading, direct human feedback is no longer needed, and the language model continues learning and improving using algorithmic grading alone. [18 Sep 2019] ref [9 Dec 2022]
-
Proximal Policy Optimization (PPO)
is a reinforcement learning method using first-order optimization. It modifies the objective function to penalize large policy changes, specifically those that move the probability ratio away from 1. Aiming for TRPO (Trust Region Policy Optimization)-level performance without its complexity which requires second-order optimization.
-
-
Direct Preference Optimization (DPO): [cnt]: 1. RLHF can be complex because it requires fitting a reward model and performing significant hyperparameter tuning. On the other hand, DPO directly solves a classification problem on human preference data in just one stage of policy training. DPO more stable, efficient, and computationally lighter than RLHF. 2.
Your Language Model Is Secretly a Reward Model
[29 May 2023]- Direct Preference Optimization (DPO) uses two models: a trained model (or policy model) and a reference model (copy of trained model). The goal is to have the trained model output higher probabilities for preferred answers and lower probabilities for rejected answers compared to the reference model. ref: RHLF vs DPO [Jan 2, 2024] / ref [1 Jul 2023]
-
ORPO (odds ratio preference optimization): Monolithic Preference Optimization without Reference Model. New method that
combines supervised fine-tuning and preference alignment into one process
git [12 Mar 2024] Fine-tune Llama 3 with ORPO [Apr 2024]
- Reinforcement Learning from AI Feedback (RLAF): [cnt]: Uses AI feedback to generate instructions for the model. TLDR: CoT (Chain-of-Thought, Improved), Few-shot (Not improved). Only explores the task of summarization. After training on a few thousand examples, performance is close to training on the full dataset. RLAIF vs RLHF: In many cases, the two policies produced similar summaries. [1 Sep 2023]
- OpenAI Spinning Up in Deep RL!: An educational resource to help anyone learn deep reinforcement learning. git [Nov 2018]
- Preference optimization techniques: ref [13 Aug 2024]
-
DPO (Direct preference optimization)
removes the need for a reward model. -
IPO (Identity Preference Optimization)
: A change in the objective, which is simpler and less prone to overfitting. -
KTO (Kahneman-Tversky Optimization)
: Scales more data by replacing the pairs of accepted and rejected generations with a binary label. -
ORPO (Odds Ratio Preference Optimization)
: Combines instruction tuning and preference optimization into one training process, which is cheaper and faster.
-
- A Survey on Model Compression for Large Language Models ref [15 Aug 2023]
-
Quantization-aware training (QAT): The model is further trained with quantization in mind after being initially trained in floating-point precision.
-
Post-training quantization (PTQ): The model is quantized after it has been trained without further optimization during the quantization process.
Method Pros Cons Post-training quantization Easy to use, no need to retrain the model May result in accuracy loss Quantization-aware training Can achieve higher accuracy than post-training quantization Requires retraining the model, can be more complex to implement -
bitsandbytes: 8-bit optimizers git [Oct 2021]
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits. BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. [27 Feb 2024]
-
Pruning: The process of removing some of the neurons or layers from a neural network. This can be done by identifying and eliminating neurons or layers that have little or no impact on the network's output.
-
Sparsification: A technique used to reduce the size of large language models by removing redundant parameters.
-
Wanda Pruning: [cnt]: A Simple and Effective Pruning Approach for Large Language Models [20 Jun 2023] ref
-
phi-series: cost-effective small language models (SLMs) ref
Expand
-
Phi-3.5-MoE-instruct: ref [Aug 2024]
-
phi-3: Phi-3-mini, with 3.8 billion parameters, supports 4K and 128K context, instruction tuning, and hardware optimization. [Apr 2024] ref
- phi-3-vision (multimodal), phi-3-small, phi-3 (7b), phi-sillica (Copilot+PC designed for NPUs)
-
phi-2: open source, and 50% better at mathematical reasoning. git [Dec 2023]
-
phi-1.5: [cnt]: Textbooks Are All You Need II. Phi 1.5 is trained solely on synthetic data. Despite having a mere 1 billion parameters compared to Llama 7B's much larger model size, Phi 1.5 often performs better in benchmark tests. [11 Sep 2023]
-
phi-1: [cnt]: Despite being small in size, phi-1 attained 50.6% on HumanEval and 55.5% on MBPP. Textbooks Are All You Need. ref [20 Jun 2023]
-
-
Orca 2: [cnt]: Orca learns from rich signals from GPT 4 including explanation traces; step-by-step thought processes; and other complex instructions, guided by teacher assistance from ChatGPT. ref [18 Nov 2023]
-
Distilled Supervised Fine-Tuning (dSFT)
- Zephyr 7B: [cnt] Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO). ref [25 Oct 2023]
- Mistral 7B: [cnt]: Outperforms Llama 2 13B on all benchmarks. Uses Grouped-query attention (GQA) for faster inference. Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost. ref [10 Oct 2023]
-
Transformer cache key-value tensors of context tokens into GPU memory to facilitate fast generation of the next token. However, these caches occupy significant GPU memory. The unpredictable nature of cache size, due to the variability in the length of each request, exacerbates the issue, resulting in significant memory fragmentation in the absence of a suitable memory management mechanism.
-
To alleviate this issue, PagedAttention was proposed to store the KV cache in non-contiguous memory spaces. It partitions the KV cache of each sequence into multiple blocks, with each block containing the keys and values for a fixed number of tokens.
-
PagedAttention : vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention, 24x Faster LLM Inference doc. ref [12 Sep 2023]
- PagedAttention for a prompt “the cat is sleeping in the kitchen and the dog is”. Key-Value pairs of tensors for attention computation are stored in virtual contiguous blocks mapped to non-contiguous blocks in the GPU memory.
-
TokenAttention an attention mechanism that manages key and value caching at the token level. git [Jul 2023]
-
Flash Attention: [cnt] [27 May 2022] / FlashAttention-2: [cnt] [17 Jul 2023]: An method that reorders the attention computation and leverages classical techniques (tiling, recomputation). Instead of storing each intermediate result, use kernel fusion and run every operation in a single kernel in order to avoid memory read/write overhead. git -> Compared to a standard attention implementation in PyTorch, FlashAttention-2 can be up to 9x faster / FlashAttention-3 [11 Jul 2024]
-
CPU vs GPU vs TPU: The threads are grouped into thread blocks. Each of the thread blocks has access to a fast shared memory (SRAM). All the thread blocks can also share a large global memory. (high-bandwidth memories (HBM).
HBM Bandwidth: 1.5-2.0TB/s vs SRAM Bandwidth: 19TB/s ~ 10x HBM
[27 May 2024]
- LLM patterns: 🏆From data to user, from defensive to offensive doc
- What We’ve Learned From A Year of Building with LLMs: A practical guide to building successful LLM products, covering the tactical, operational, and strategic. [8 June 2024]
- Large Transformer Model Inference Optimization: Besides the increasing size of SoTA models, there are two main factors contributing to the inference challenge ... [10 Jan 2023]
- Mixture of experts models: Mixtral 8x7B: Sparse mixture of experts models (SMoE) magnet [Dec 2023]
- Huggingface Mixture of Experts Explained: Mixture of Experts, or MoEs for short [Dec 2023]
- Simplifying Transformer Blocks: Simplifie Transformer. Removed several block components, including skip connections, projection/value matrices, sequential sub-blocks and normalisation layers without loss of training speed. [3 Nov 2023]
-
Model merging: : A technique that combines two or more large language models (LLMs) into a single model, using methods such as SLERP, TIES, DARE, and passthrough. [Jan 2024] git: mergekit
Method Pros Cons SLERP Preserves geometric properties, popular method Can only merge two models, may decrease magnitude TIES Can merge multiple models, eliminates redundant parameters Requires a base model, may discard useful parameters DARE Reduces overfitting, keeps expectations unchanged May introduce noise, may not work well with large differences -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces [1 Dec 2023] git: 1. Structured State Space (S4) - Class of sequence models, encompassing traits from RNNs, CNNs, and classical state space models. 2. Hardware-aware (Optimized for GPU) 3. Integrating selective SSMs and eliminating attention and MLP blocks ref / A Visual Guide to Mamba and State Space Models ref [19 FEB 2024]
- Mamba-2: 2-8X faster [31 May 2024]
- Sakana.ai: Evolutionary Optimization of Model Merging Recipes.: A Method to Combine 500,000 OSS Models. git [19 Mar 2024]
- Mixture-of-Depths: All tokens should not require the same effort to compute. The idea is to make token passage through a block optional. Each block selects the top-k tokens for processing, and the rest skip it. ref [2 Apr 2024]
- Kolmogorov-Arnold Networks (KANs): KANs use activation functions on connections instead of nodes like Multi-Layer Perceptrons (MLPs) do. Each weight in KANs is replaced by a learnable 1D spline function. KANs’ nodes simply sum incoming signals without applying any non-linearities. git [30 Apr 2024] / ref: A Beginner-friendly Introduction to Kolmogorov Arnold Networks (KAN) [19 May 2024]
- Better & Faster Large Language Models via Multi-token Prediction: Suggest that training language models to predict multiple future tokens at once [30 Apr 2024]
- Lamini Memory Tuning: Mixture of Millions of Memory Experts (MoME). 95% LLM Accuracy, 10x Fewer Hallucinations. ref [Jun 2024]
- Scaling Synthetic Data Creation with 1,000,000,000 Personas A persona-driven data synthesis methodology using Text-to-Persona and Persona-to-Persona. [28 Jun 2024]
- RouteLLM: a framework for serving and evaluating LLM routers. [Jun 2024]
- KAN or MLP: A Fairer Comparison: In machine learning, computer vision, audio processing, natural language processing, and symbolic formula representation (except for symbolic formula representation tasks), MLP generally outperforms KAN. [23 Jul 2024]
- What is Visual prompting: Similarly to what has happened in NLP, large pre-trained vision transformers have made it possible for us to implement Visual Prompting. doc [26 Apr 2023]
- Visual Prompting [21 Nov 2022]
- Andrew Ng’s Visual Prompting Livestream [24 Apr 2023]
- What is Visual Grounding: Visual Grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query.
- Screen AI: ScreenAI, a model designed for understanding and interacting with user interfaces (UIs) and infographics. Refer to Generated Annotation image. [Mar 2024]
- Humanloop Interview 2023 : doc [29 May 2023]
- OpenAI’s CEO Says the Age of Giant AI Models Is Already Over ref [17 Apr 2023]
- Q* (pronounced as Q-Star): The model, called Q* was able to solve basic maths problems it had not seen before, according to the tech news site the Information. ref [23 Nov 2023]
- Sam Altman reveals in an interview with Bill Gates (2 days ago) what's coming up in GPT-4.5 (or GPT-5): Potential integration with other modes of information beyond text, better logic and analysis capabilities, and consistency in performance over the next two years. ref [12 Jan 2024]
- Model Spec: Desired behavior for the models in the OpenAI API and ChatGPT ref [8 May 2024] ref: takeaway
- A new series of reasoning models: The complex reasoning-specialized model, OpenAI o1 series, excels in math, coding, and science, outperforming GPT-4o on key benchmarks. [12 Sep 2024] / ref: Awesome LLM Strawberry (OpenAI o1)
- GPT-4V(ision) system card: ref [25 Sep 2023] / ref
- The Dawn of LMMs: [cnt]: Preliminary Explorations with GPT-4V(ision) [29 Sep 2023]
- GPT-4 details leaked
- GPT-4 is a language model with approximately 1.8 trillion parameters across 120 layers, 10x larger than GPT-3. It uses a Mixture of Experts (MoE) model with 16 experts, each having about 111 billion parameters. Utilizing MoE allows for more efficient use of resources during inference, needing only about 280 billion parameters and 560 TFLOPs, compared to the 1.8 trillion parameters and 3,700 TFLOPs required for a purely dense model.
- The model is trained on approximately 13 trillion tokens from various sources, including internet data, books, and research papers. To reduce training costs, OpenAI employs tensor and pipeline parallelism, and a large batch size of 60 million. The estimated training cost for GPT-4 is around $63 million. ref [Jul 2023]
- OpenAI DevDay 2023: GPT-4 Turbo with 128K context, Assistants API (Code interpreter, Retrieval, and function calling), GPTs (Custom versions of ChatGPT: ref), Copyright Shield, Parallel Function Calling, JSON Mode, Reproducible outputs [6 Nov 2023]
- ChatGPT can now see, hear, and speak: It has recently been updated to support multimodal capabilities, including voice and image. [25 Sep 2023] Whisper / CLIP
- GPT-3.5 Turbo Fine-tuning Fine-tuning for GPT-3.5 Turbo is now available, with fine-tuning for GPT-4 coming this fall. [22 Aug 2023]
- DALL·E 3 : In September 2023, OpenAI announced their latest image model, DALL-E 3 git [Sep 2023]
- Open AI Enterprise: Removes GPT-4 usage caps, and performs up to two times faster ref [28 Aug 2023]
- ChatGPT Plugin [23 Mar 2023]
-
ChatGPT Function calling [Jun 2023]
- Azure OpenAI start to support function calling. ref
- Custom instructions: In a nutshell, the Custom Instructions feature is a cross-session memory that allows ChatGPT to retain key instructions across chat sessions. [20 Jul 2023]
- Introducing the GPT Store: Roll out the GPT Store to ChatGPT Plus, Team and Enterprise users GPTs [10 Jan 2024]
-
New embedding models
text-embedding-3-small
: Embedding size: 512, 1536text-embedding-3-large
: Embedding size: 256,1024,3072 [25 Jan 2024] - Sora Text-to-video model. Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt. [15 Feb 2024]
-
ChatGPT Memory: Remembering things you discuss
across all chats
saves you from having to repeat information and makes future conversations more helpful. [Apr 2024] - CriticGPT: a version of GPT-4 fine-tuned to critique code generated by ChatGPT [27 Jun 2024]
- SearchGPT: AI search [25 Jul 2024]
- Structured Outputs in the API: a new feature designed to ensure model-generated outputs will exactly match JSON Schemas provided by developers. [6 Aug 2024]
- GPT 1: Decoder-only model. 117 million parameters. [Jun 2018] git
- GPT 2: Increased model size and parameters. 1.5 billion. [14 Feb 2019] git
- GPT 3: Introduced few-shot learning. 175B. [11 Jun 2020] git
- GPT 3.5: 3 variants each with 1.3B, 6B, and 175B parameters. [15 Mar 2022] Estimate the embedding size of OpenAI's gpt-3.5-turbo to be about 4,096
- ChtGPT: GPT-3 fine-tuned with RLHF. 20B or 175B.
unverified
ref [30 Nov 2022] - GPT 4: Mixture of Experts (MoE). 8 models with 220 billion parameters each, for a total of about 1.76 trillion parameters.
unverified
ref [14 Mar 2023] - GPT-4o: o stands for Omni. 50% cheaper. 2x faster. Multimodal input and output capabilities (text, audio, vision). supports 50 languages. [13 May 2024] / GPT-4o mini: 15 cents per million input tokens, 60 cents per million output tokens, MMLU of 82%, and fast. [18 Jul 2024]
- Introducing 100K Context Windows: hundreds of pages, Around 75,000 words; [11 May 2023] demo Anthropic Claude
-
“Needle in a Haystack” Analysis [21 Nov 2023]: Context Window Benchmarks; Claude 2.1 (200K Context Window) vs GPT-4; Long context prompting for Claude 2.1
adding just one sentence, “Here is the most relevant sentence in the context:”, to the prompt resulted in near complete fidelity throughout Claude 2.1’s 200K context window.
[6 Dec 2023] -
Rotary Positional Embedding (RoPE): [cnt] / ref / doc [20 Apr 2021]
- How is this different from the sinusoidal embeddings used in "Attention is All You Need"?
- Sinusoidal embeddings apply to each coordinate individually, while rotary embeddings mix pairs of coordinates
- Sinusoidal embeddings add a
cos
orsin
term, while rotary embeddings use a multiplicative factor. - Rotary embeddings are applied to positional encoding to K and V, not to the input embeddings.
- How is this different from the sinusoidal embeddings used in "Attention is All You Need"?
-
Lost in the Middle: How Language Models Use Long Contexts: [cnt] [6 Jul 2023]
- Best Performace when relevant information is at beginning
- Too many retrieved documents will harm performance
- Performacnce decreases with an increase in context
-
Structured Prompting: Scaling In-Context Learning to 1,000 Examples: [cnt] [13 Dec 2022]
- Microsoft's Structured Prompting allows thousands of examples, by first concatenating examples into groups, then inputting each group into the LM. The hidden key and value vectors of the LM's attention modules are cached. Finally, when the user's unaltered input prompt is passed to the LM, the cached attention vectors are injected into the hidden layers of the LM.
- This approach wouldn't work with OpenAI's closed models. because this needs to access [keys] and [values] in the transformer internals, which they do not expose. You could implement yourself on OSS ones. cite [07 Feb 2023]
- Ring Attention: [cnt]: 1. Ring Attention, which leverages blockwise computation of self-attention to distribute long sequences across multiple devices while overlapping the communication of key-value blocks with the computation of blockwise attention. 2. Ring Attention can reduce the memory requirements of Transformers, enabling us to train more than 500 times longer sequence than prior memory efficient state-of-the-arts and enables the training of sequences that exceed 100 million in length without making approximations to attention. 3. we propose an enhancement to the blockwise parallel transformers (BPT) framework. git [3 Oct 2023]
- LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning. With only four lines of code modification, the proposed method can effortlessly extend existing LLMs' context window without any fine-tuning. [2 Jan 2024]
- Giraffe: Adventures in Expanding Context Lengths in LLMs. A new truncation strategy for modifying the basis for the position encoding. ref [2 Jan 2024]
-
Leave No Context Behind: Efficient
Infinite Context
Transformers with Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism. Integrate attention from both local and global attention. [10 Apr 2024]
- Open AI Tokenizer: GPT-3, Codex Token counting
- tiktoken: BPE tokeniser for use with OpenAI's models. Token counting. [Dec 2022]
- What are tokens and how to count them?: OpenAI Articles
- 5 Approaches To Solve LLM Token Limits : doc [2023]
- Byte-Pair Encoding (BPE): P.2015. The most widely used tokenization algorithm for text today. BPE adds an end token to words, splits them into characters, and merges frequent byte pairs iteratively until a stop criterion. The final tokens form the vocabulary for new data encoding and decoding. [31 Aug 2015] / ref [13 Aug 2021]
- Tokencost: Token price estimates for 400+ LLMs [Dec 2023]
-
Numbers every LLM Developer should know [18 May 2023]
- NeMo Guardrails: Building Trustworthy, Safe and Secure LLM Conversational Systems [Apr 2023]
- Trustworthy LLMs: [cnt]: Comprehensive overview for assessing LLM trustworthiness; Reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. [10 Aug 2023]
-
Political biases of LLMs: [cnt]: From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models. [15 May 2023]
- Red Teaming: The term red teaming has historically described systematic adversarial attacks for testing security vulnerabilities. LLM red teamers should be a mix of people with diverse social and professional backgrounds, demographic groups, and interdisciplinary expertise that fits the deployment context of your AI system. ref
- The Foundation Model Transparency Index: [cnt]: A comprehensive assessment of the transparency of foundation model developers ref [19 Oct 2023]
- Hallucinations: [cnt]: A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions [9 Nov 2023]
- Hallucination Leaderboard: Evaluate how often an LLM introduces hallucinations when summarizing a document. [Nov 2023]
- OpenAI Weak-to-strong generalization: In the superalignment problem, humans must supervise models that are much smarter than them. The paper discusses supervising a GPT-4 or 3.5-level model using a GPT-2-level model. It finds that while strong models supervised by weak models can outperform the weak models, they still don’t perform as well as when supervised by ground truth. git [14 Dec 2023]
- A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models: A compre hensive survey of over thirty-two techniques developed to mitigate hallucination in LLMs [2 Jan 2024]
- Anthropic Many-shot jailbreaking: simple long-context attack, Bypassing safety guardrails by bombarding them with unsafe or harmful questions and answers. [3 Apr 2024]
-
FactTune: A procedure that enhances the factuality of LLMs without the need for human feedback. The process involves the fine-tuning of a separated LLM using methods such as DPO and RLAIF, guided by preferences generated by FActScore. [14 Nov 2023]
FActScore
works by breaking down a generation into a series of atomic facts and then computing the percentage of these atomic facts by a reliable knowledge source. - The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions. The OpenAI highlights the need for instruction privileges in LLMs to prevent attacks and proposes training models to conditionally follow lower-level instructions based on their alignment with higher-level instructions. [19 Apr 2024]
- Mapping the Mind of a Large Language Model: Anthrophic, A technique called "dictionary learning" can help understand model behavior by identifying which features respond to a particular input, thus providing insight into the model's "reasoning." ref [21 May 2024]
- Frontier Safety Framework: Google DeepMind, Frontier Safety Framework, a set of protocols designed to identify and mitigate potential harms from future AI systems. [17 May 2024]
- Extracting Concepts from GPT-4: Sparse Autoencoders identify key features, enhancing the interpretability of language models like GPT-4. They extract 16 million interpretable features using GPT-4's outputs as input for training. [6 Jun 2024]
- NIST AI Risk Management Framework: NIST released the first complete version of the NIST AI RMF Playbook on March 30, 2023
- Guardrails Hub: Guardrails for common LLM validation use cases
- AI models collapse when trained on recursively generated data: Model Collapse. We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear. [24 Jul 2024]
- LLMs Will Always Hallucinate, and We Need to Live With This: LLMs cannot completely eliminate hallucinations through architectural improvements, dataset enhancements, or fact-checking mechanisms due to fundamental mathematical and logical limitations. [9 Sep 2024]
- Emergent Abilities of Large Language Models: [cnt]: Large language models can develop emergent abilities, which are not explicitly trained but appear at scale and are not present in smaller models. . These abilities can be enhanced using few-shot and augmented prompting techniques. ref [15 Jun 2022]
- Multitask Prompted Training Enables Zero-Shot Task Generalization: [cnt]: A language model trained on various tasks using prompts can learn and generalize to new tasks in a zero-shot manner. [15 Oct 2021]
- Language Modeling Is Compression: [cnt]: Lossless data compression, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%). [19 Sep 2023]
- LLMs Represent Space and Time: [cnt]: Large language models learn world models of space and time from text-only training. [3 Oct 2023]
- Improving mathematical reasoning with process supervision [31 May 2023]
- Math soving optimized LLM WizardMath: [cnt]: Developed by adapting Evol-Instruct and Reinforcement Learning techniques, these models excel in math-related instructions like GSM8k and MATH. git [18 Aug 2023] / Math solving Plugin: Wolfram alpha
- Large Language Models for Software Engineering: [cnt]: Survey and Open Problems, Large Language Models (LLMs) for Software Engineering (SE) applications, such as code generation, testing, repair, and documentation. [5 Oct 2023]
- LLMs for Chip Design: Domain-Adapted LLMs for Chip Design [31 Oct 2023]
-
Design2Code: How Far Are We From Automating Front-End Engineering?
64% of cases GPT-4V generated webpages are considered better than the original reference webpages
[5 Mar 2024] - Testing theory of mind in large language models and humans: Some large language models (LLMs) perform as well as, and in some cases better than, humans when presented with tasks designed to test the ability to track people’s mental states, known as “theory of mind.” cite [20 May 2024]
- A Survey on Employing Large Language Models for Text-to-SQL Tasks: a comprehensive overview of LLMs in text-to-SQL tasks [21 Jul 2024]
- Can LLMs Generate Novel Research Ideas?: A Large-Scale Human Study with 100+ NLP Researchers. We find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas. However, the study revealed a lack of diversity in AI-generated ideas. [6 Sep 2024]
- Change in perspective is necessary because some abilities only emerge at a certain scale. Some conclusions from the past are invalidated and we need to constantly unlearn intuitions built on top of such ideas.
- From first-principles, scaling up the Transformer amounts to efficiently doing matrix multiplications with many, many machines.
- Further scaling (think 10000x GPT-4 scale). It entails finding the inductive bias that is the bottleneck in further scaling.
- Twitter / Video / Slides [6 Oct 2023]
- LLMprices.dev: Compare prices for models like GPT-4, Claude Sonnet 3.5, Llama 3.1 405b and many more.
- AI Model Review: Compare 75 AI Models on 200+ Prompts Side By Side.
- Artificial Analysis: Independent analysis of AI models and API providers.
- Inside language models (from GPT to Olympus)
-
LLM Pre-training and Post-training Paradigms [17 Aug 2024]
-
Evolutionary Graph of LLaMA Family
-
LLM evolutionary tree
-
A Survey of Large Language Models: [cnt] /git [31 Mar 2023] contd.
-
LLM evolutionary tree: [cnt]: A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers) git [26 Apr 2023]
-
An overview of different fields of study and recent developments in NLP. doc ref [24 Sep 2023]
“Exploring the Landscape of Natural Language Processing Research” ref [20 Jul 2023]
NLP taxonomy
Distribution of the number of papers by most popular fields of study from 2002 to 2022
- The LLM Index: A list of large language models (LLMs)
- Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings
- LLM Collection: promptingguide.ai
- Huggingface Open LLM Learboard
- ollam: ollama-supported models
- KoAlpaca: Alpaca for korean [Mar 2023]
- Pythia: How do large language models (LLMs) develop and evolve over the course of training and change as models scale? A suite of decoder-only autoregressive language models ranging from 70M to 12B parameters git [Apr 2023]
- OLMo: Truly open language model and framework to build, study, and advance LMs, along with the training data, training and evaluation code, intermediate model checkpoints, and training logs. git [Feb 2024] / OLMoE: fully-open LLM leverages sparse Mixture-of-Experts [Sep 2024]
- Open-Sora: Democratizing Efficient Video Production for All [Mar 2024]
- Jamba: AI21's SSM-Transformer Model. Mamba + Transformer + MoE [28 Mar 2024]
- Meta (aka. Facebook)
- Most OSS LLM models have been built on the Llama / ref / git
- Llama 2: 1) 40% more data than Llama. 2)7B, 13B, and 70B. 3) Trained on over 1 million human annotations. 4) double the context length of Llama 1: 4K 5) Grouped Query Attention, KV Cache, and Rotary Positional Embedding were introduced in Llama 2 [18 Jul 2023] demo
- Llama 3: 1) 7X more data than Llama 2. 2) 8B, 70B, and 400B. 3) 8K context length [18 Apr 2024]
- MEGALODON: Long Sequence Model. Unlimited context length. Outperforms Llama 2 model. [Apr 2024]
- Llama 3.1: 405B, context length to 128K, add support across eight languages. first OSS model outperforms GTP-4o. [23 Jul 2024] / Llama 3.2: Multimodal 11B and 90B model support image reasoning. lightweight 1B and 3B models. [25 Sep 2024]
- Google
- Gemma: Open weights LLM from Google DeepMind. git / Pytorch git [Feb 2024]
- Gemma 2 2B, 9B, 27B ref: releases [Jun 2024]
- DataGemma [12 Sep 2024] / NotebookLM: LLM-powered notebook. free to use, not open-source. [12 Jul 2023]
- Qualcomm
- Qualcomm’s on-device AI models: Bring generative AI to mobile devices [Feb 2024]
- xAI
- xAI is an American AI company founded by Elon Musk in March 2023
- Grok: 314B parameter Mixture-of-Experts (MoE) model. Released under the Apache 2.0 license. Not includeded training code. Developed by JAX git [17 Mar 2024]
- Grok-2 and Grok-2 mini [13 Aug 2024]
- Databricks
- Apple
- OpenELM: Apple released a Transformer-based language model. Four sizes of the model: 270M, 450M, 1.1B, and 3B parameters. [April 2024]
- Apple Intelligence Foundation Language Models: 1. A 3B on-device model used for language tasks like summarization and Writing Tools. 2. A large Server model used for language tasks too complex to do on-device. [10 Jun 2024]
- Microsoft
- phi-series: cost-effective small language models (SLMs) X-ref
- NVIDIA
- Nemotron-4 340B: Synthetic Data Generation for Training Large Language Models [14 Jun 2024]
- Mistral
- Founded in April 2023. French tech.
- Groq
- Founded in 2016. low-latency AI inference H/W. American tech.
- Llama-3-Groq-Tool-Use: a model optimized for function calling [Jul 2024]
- Alibaba
- Qwen series > Qwen2: 29 languages. 5 sizes, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B. [Feb 2024]
- Cohere
- Founded in 2019. Canadian multinational tech.
- Command R+: The performant model for RAG capabilities, multilingual support, and tool use. [Aug 2024]
- Deepseek
- Founded in 2023, is a Chinese company dedicated to AGI.
- A list of models: git
- GPT for Domain Specific X-ref
- MLLM (multimodal large language model) X-ref
- Large Language Models (in 2023) X-ref
Expand: Llama variants emerged in 2023
- Upstage's 70B Language Model Outperforms GPT-3.5: ref [1 Aug 2023]
- Falcon LLM Apache 2.0 license [Mar 2023]
- StableVicuna First Open Source RLHF LLM Chatbot [Apr 2032]
- Alpaca: Fine-tuned from the LLaMA 7B model [Mar 2023]
- vicuna: 90% ChatGPT Quality [Mar 2023]
- Koala: Focus on dialogue data gathered from the web. [Apr 2023]
- dolly: Databricks [Mar 2023]
- Cerebras-GPT: 7 GPT models ranging from 111m to 13b parameters. [Mar 2023]
- TimeGPT: The First Foundation Model for Time Series Forecasting git [Mar 2023]
- BioGPT: [cnt]: Generative Pre-trained Transformer for Biomedical Text Generation and Mining git [19 Oct 2022]
- MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers [27 Nov 2023]
- BloombergGPT: A Large Language Model for Finance [30 Mar 2023]
- Galactica: A Large Language Model for Science [16 Nov 2022]
- EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain [30 Jan 2024]
- SaulLM-7B: A pioneering Large Language Model for Law [6 Mar 2024]
- Huggingface StarCoder: A State-of-the-Art LLM for Code: git [May 2023]
- Code Llama: Built on top of Llama 2, free for research and commercial use. ref / git [24 Aug 2023]
- Devin AI: Devin is an AI software engineer developed by Cognition AI [12 Mar 2024]
- OpenDevin: an open-source project aiming to replicate Devin [Mar 2024]
- FrugalGPT: LLM with budget constraints, requests are cascaded from low-cost to high-cost LLMs. git [9 May 2023]
- DeepSeek-Coder-V2: Open-source Mixture-of-Experts (MoE) code language model [17 Jun 2024]
- Qwen2-Math: math-specific LLM / Qwen2-Audio: large-scale audio-language model [Aug 2024] / Qwen 2.5-Coder [18 Sep 2024 ]
- Chai-1: a multi-modal foundation model for molecular structure prediction [Sep 2024]
- Prithvi WxC: In collaboration with NASA, IBM is releasing an open-source foundation model for Weather and Climate ref [20 Sep 2024]
- AlphaChip: Reinforcement learning-based model for designing physical chip layouts. [26 Sep 2024]
-
Multimodal Foundation Models: From Specialists to General-Purpose Assistants: [cnt]: A comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities. Specific-Purpose 1. Visual understanding tasks 2. Visual generation tasks General-Purpose 3. General-purpose interface. [18 Sep 2023]
-
Awesome Multimodal Large Language Models: Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation. [Jun 2023]
-
CLIP: [cnt]: CLIP (Contrastive Language-Image Pretraining), Trained on a large number of internet text-image pairs and can be applied to a wide range of tasks with zero-shot learning. git [26 Feb 2021]
-
LLaVa: [cnt]: Large Language-and-Vision Assistant git [17 Apr 2023]
- Simple linear layer to connect image features into the word embedding space. A trainable projection matrix W is applied to the visual features Zv, transforming them into visual embedding tokens Hv. These tokens are then concatenated with the language embedding sequence Hq to form a single sequence. Note that Hv and Hq are not multiplied or added, but concatenated, both are same dimensionality.
- LLaVA-1.5: [cnt]: is out! git: Changing from a linear projection to an MLP cross-modal. [5 Oct 2023]
-
Video-ChatGPT: [cnt]: a video conversation model capable of generating meaningful conversation about videos. / git [8 Jun 2023]
-
MiniGPT-4 & MiniGPT-v2: [cnt]: Enhancing Vision-language Understanding with Advanced Large Language Models git [20 Apr 2023]
-
TaskMatrix, aka VisualChatGPT: [cnt]: Microsoft TaskMatrix git; GroundingDINO + SAM git [8 Mar 2023]
-
GroundingDINO: [cnt]: DINO with Grounded Pre-Training for Open-Set Object Detection git [9 Mar 2023]
-
BLIP-2 [30 Jan 2023]: [cnt]: Salesforce Research, Querying Transformer (Q-Former) / git / ref / Youtube / BLIP: [cnt]: git [28 Jan 2022]
-
Q-Former (Querying Transformer)
: A transformer model that consists of two submodules that share the same self-attention layers: an image transformer that interacts with a frozen image encoder for visual feature extraction, and a text transformer that can function as both a text encoder and a text decoder. - Q-Former is a lightweight transformer which employs a set of learnable query vectors to extract visual features from the frozen image encoder. It acts as an information bottleneck between the frozen image encoder and the frozen LLM.
-
-
MiniCPM-V: MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone [Jan 2024]
-
Vision capability to a LLM ref [22 Aug 2023]
-
Meta (aka. Facebook)
- facebookresearch/ImageBind: [cnt]: ImageBind One Embedding Space to Bind Them All git [9 May 2023]
- facebookresearch/segment-anything(SAM): [cnt]: The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model. git [5 Apr 2023]
- facebookresearch/SeamlessM4T: [cnt]: SeamlessM4T is the first all-in-one multilingual multimodal AI translation and transcription model. This single model can perform speech-to-text, speech-to-speech, text-to-speech, and text-to-text translations for up to 100 languages depending on the task. ref [22 Aug 2023]
- Chameleon: Early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. The unified approach uses fully token-based representations for both image and textual modalities. [16 May 2024]
- Models and libraries
-
Microsoft
- Language Is Not All You Need: Aligning Perception with Language Models Kosmos-1: [cnt] [27 Feb 2023]
- Kosmos-2: [cnt]: Grounding Multimodal Large Language Models to the World [26 Jun 2023]
- Kosmos-2.5: [cnt]: A Multimodal Literate Model [20 Sep 2023]
- BEiT-3: [cnt]: Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks [22 Aug 2022]
- TaskMatrix.AI: [cnt]: TaskMatrix connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting. [29 Mar 2023]
- Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks. ref [10 Nov 2023]
-
Google
- Gemini 1.5: 1 million token context window, 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. [Feb 2024]
- Foundation Models: Gemini, Veo, Gemma etc.
-
Anthrophic
- Claude 3 Opus, the largest version of the new LLM, outperforms rivals GPT-4 and Google’s Gemini 1.0 Ultra. Three variants: Opus, Sonnet, and Haiku. [Mar 2024]
-
Apple
- 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities. [13 Jun 2024]
-
Benchmarking Multimodal LLMs.
- LLaVA-1.5 achieves SoTA on a broad range of 11 tasks incl. SEED-Bench.
- SEED-Bench: [cnt]: Benchmarking Multimodal LLMs git [30 Jul 2023]
-
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models ref [25 Sep 2024]
-
Optimizing Memory Usage for Training LLMs and Vision Transformers: When applying 10 techniques to a vision transformer, we reduced the memory consumption 20x on a single GPU. ref / git [2 Jul 2023]
- The Generative AI Revolution: Exploring the Current Landscape : doc [28 Jun 2023]
- Diffusion Models vs. GANs vs. VAEs: Comparison of Deep Generative Models [12 May 2023]
Model | Description | Strengths | Weaknesses |
---|---|---|---|
GANs | Two neural networks, a generator and a discriminator, work together. The generator creates synthetic samples, and the discriminator distinguishes between real and generated samples. | Unsupervised learning, able to mimic data distributions without labeled data, and are versatile in applications like image synthesis, super-resolution, and style transfer | Known for potentially unstable training and less diversity in generation. |
VAEs | Consists of an encoder and a decoder. The encoder maps input data into a low-dimensional representation, and the decoder reconstructs the original input data from this representation. e.g, DALLE
|
Efficient at learning latent representations and can be used for tasks like data denoising and anomaly detection, in addition to data generation. | Dependent on an approximate loss function. |
Diffusion Models | Consists of forward and reverse diffusion processes. Forward diffusion adds noise to input data until white noise is obtained. The reverse diffusion process removes the noise to recover the original data. e.g, Stable Diffusion
|
Capable of producing high-quality, step-by-step samples. | Multi-step (often 1000) generation process. |
-
Picked out the list by [cited by count] and used [survey] as a search keyword. The papers on a specific topic are included even if few [cited by count].
-
A Survey of LLMs
- Large Language Models: A Survey [9 Feb 2024]: 🏆Well organized visuals and contents
- A Survey of Transformers:[cnt] [8 Jun 2021]
- A Survey of Large Language Models:[cnt] [v1: 31 Mar 2023 - v13: 24 Nov 2023]
- A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT:[cnt] [7 Mar 2023]
- Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models:[cnt] [4 Apr 2023]
- A Survey on Language Models for Code:[cnt] [14 Nov 2023]
- ChatGPT’s One-year Anniversary: Are Open-Source Large Language Models Catching up? > Evaluation benchmark: Benchmarks and Performance of LLMs [28 Nov 2023]
- From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape:[cnt] [18 Dec 2023]
- Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems: The survey aims to provide a comprehensive understanding of the current state and future directions in efficient LLM serving [23 Dec 2023]
- A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?: [9 Aug 2024] git
- What is the Role of Small Models in the LLM Era: A Survey [10 Sep 2024]
-
State of AI
- Retool: Status of AI: A Report on AI In Production 2023 -> 2024
-
The State of Generative AI in the Enterprise [ⓒ2023]
- 96% of AI spend is on inference, not training. 2. Only 10% of enterprises pre-trained own models. 3. 85% of models in use are closed-source. 4. 60% of enterprises use multiple models.
- Standford AI Index Annual Report
-
Google AI Research Recap
- Gemini [06 Dec 2023] Three different sizes: Ultra, Pro, Nano. With a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU ref
- Google AI Research Recap (2022 Edition)
- Themes from 2021 and Beyond
- Looking Back at 2020, and Forward to 2021
-
Microsoft Research Recap
- Research at Microsoft 2023: A year of groundbreaking AI advances and discoveries
Expand
- Data Management For Large Language Models: A Survey [4 Dec 2023]
- Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond:[cnt] [26 Apr 2023]
- A Cookbook of Self-Supervised Learning:[cnt] [24 Apr 2023]
- A Survey on In-context Learning:[cnt] [31 Dec 2022]
- A Survey on Evaluation of Large Language Models:[cnt] [6 Jul 2023]
- Mitigating Hallucination in LLMs: Summarizes 32 techniques to mitigate hallucination in LLMs [cnt] [2 Jan 2024]
- Retrieval-Augmented Generation for Large Language Models: A Survey [cnt] [18 Dec 2023]
- A Survey on Multimodal Large Language Models:[cnt] [23 Jun 2023]
- SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension: [cnt] [30 Jul 2023]
- Survey of Hallucination in Natural Language Generation:[cnt] [8 Feb 2022]
- Hallucination in LLMs:[cnt] [9 Nov 2023]
- Evaluating Large Language Models: A Comprehensive Survey:[cnt] [30 Oct 2023]
- A Survey of Techniques for Optimizing Transformer Inference:[cnt] [16 Jul 2023]
- An Overview on Language Models: Recent Developments and Outlook:[cnt] [10 Mar 2023]
- Efficient Guided Generation for Large Language Models:[cnt] [19 Jul 2023]
- Challenges & Application of LLMs:[cnt] [11 Jun 2023]
- A Survey on LLM-based Autonomous Agents:[cnt] [22 Aug 2023]
- A Survey on Efficient Training of Transformers:[cnt] [2 Feb 2023]
- Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback:[cnt] [27 Jul 2023]
- Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning:[cnt] [28 Mar 2023]
- Survey of Aligned LLMs:[cnt] [24 Jul 2023]
- Survey on Instruction Tuning for LLMs:[cnt] [21 Aug 2023]
- A Survey on Transformers in Reinforcement Learning:[cnt] [8 Jan 2023]
- Model Compression for LLMs:[cnt] [15 Aug 2023]
- Foundation Models in Vision:[cnt] [25 Jul 2023]
- Multimodal Deep Learning:[cnt] [12 Jan 2023]
- Trustworthy LLMs:[cnt] [10 Aug 2023]
- Universal and Transferable Adversarial Attacks on Aligned Language Models:[cnt] [27 Jul 2023]
- A Survey of LLMs for Healthcare:[cnt] [9 Oct 2023]
- Overview of Factuality in LLMs:[cnt] [11 Oct 2023]
- A Comprehensive Survey of Compression Algorithms for Language Models [27 Jan 2024]
-
An unnecessarily tiny implementation of GPT-2 in NumPy. picoGPT: Transformer Decoder [Jan 2023]
q = x @ w_k # [n_seq, n_embd] @ [n_embd, n_embd] -> [n_seq, n_embd] k = x @ w_q # [n_seq, n_embd] @ [n_embd, n_embd] -> [n_seq, n_embd] v = x @ w_v # [n_seq, n_embd] @ [n_embd, n_embd] -> [n_seq, n_embd] # In picoGPT, combine w_q, w_k and w_v into a single matrix w_fc x = x @ w_fc # [n_seq, n_embd] @ [n_embd, 3*n_embd] -> [n_seq, 3*n_embd]
-
lit-gpt: Hackable implementation of state-of-the-art open-source LLMs based on nanoGPT. Supports flash attention, 4-bit and 8-bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed. git [Mar 2023]
-
pix2code: Generating Code from a Graphical User Interface Screenshot. Trained dataset as a pair of screenshots and simplified intermediate script for HTML, utilizing image embedding for CNN and text embedding for LSTM, encoder and decoder model. Early adoption of image-to-code. [May 2017] -> Screenshot to code: Turning Design Mockups Into Code With Deep Learning [Oct 2017] ref
-
Build a Large Language Model (From Scratch):🏆Implementing a ChatGPT-like LLM from scratch, step by step
-
Spreadsheets-are-all-you-need: Spreadsheets-are-all-you-need implements the forward pass of GPT2 entirely in Excel using standard spreadsheet functions. [Sep 2023]
-
llm.c: LLM training in simple, raw C/CUDA [Apr 2024]
- Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20 ref
-
llama3-from-scratch: Implementing Llama3 from scratch [May 2024]
-
Umar Jamil github: Model explanation / building a model from scratch youtube
-
youtube
: Andrej Karpathy: Reproduce the GPT-2 (124M) from scratch. [June 2024] / SebastianRaschka: Developing an LLM: Building, Training, Finetuning [June 2024] -
Transformer Explainer: an open-source interactive tool to learn about the inner workings of a Transformer model (GPT-2) git [8 Aug 2024]
Expand
- Beam Search [1977] in Transformers is an inference algorithm that maintains the
beam_size
most probable sequences until the end token appears or maximum sequence length is reached. Ifbeam_size
(k) is 1, it's aGreedy Search
. If k equals the total vocabularies, it's anExhaustive Search
. ref [Mar 2022]
- ref: Must-Read Starter Guide to Mastering Attention Mechanisms in Machine Learning [12 Jun 2023]
-
Encoder-Decoder Attention:
- Soft Attention: assigns continuous weights to input elements, allowing the model to attend to multiple elements simultaneously. Used in neural machine translation.
- Hard Attention: selects a subset of input elements to focus on while ignoring the rest. Used in image captioning.
- Global Attention: focuses on all elements of the input sequence when computing attention weights. Captures long-range dependencies and global context.
- Local Attention: focuses on a smaller, localized region of the input sequence when computing attention weights. Reduces computational complexity. Used in time series analysis.
-
Extended Forms of Attention: Only one Decoder component (only Input Sequence, no Target Sequence)
- Self Attention: attends to different parts of the input sequence itself, rather than another sequence or modality. Captures long-range dependencies and contextual information. Used in transformer models.
- Multi-head Self-Attention: performs self-attention multiple times in parallel, allowing the model to jointly attend to information from different representation subspaces.
-
Other Types of Attention:
- Sparse Attention: reduces computation by focusing on a limited selection of similarity scores in a sequence, resulting in a sparse matrix. It includes implementations of “strided” and “fixed” attention. ref [23 Oct 2020]
-
Cross-Attention: mixes two different embedding sequences, allowing the model to attend to information from both sequences. In a Transformer, when the information is passed from encoder to decoder that part is known as Cross Attention. ref / ref [9 Feb 2023]
-
Sliding Window Attention (SWA): A technique used Longformer. It uses a fixed-size window of attention around each token, which allows the model to scale efficiently to long inputs. Each token attends to half the window size tokens on each side. ref
- Beam Search [1977] in Transformers is an inference algorithm that maintains the
- LLM 研究プロジェクト: ブログ記事一覧 [27 Jul 2023]
- ブレインパッド社員が投稿した Qiita 記事まとめ: ブレインパッド社員が投稿した Qiita 記事まとめ [Jul 2023]
- rinna: rinna の 36 億パラメータの日本語 GPT 言語モデル: 3.6 billion parameter Japanese GPT language model [17 May 2023]
- rinna: bilingual-gpt-neox-4b: 日英バイリンガル大規模言語モデル [17 May 2023]
- 法律:生成 AI の利用ガイドライン: Legal: Guidelines for the Use of Generative AI
- New Era of Computing - ChatGPT がもたらした新時代 [May 2023]
- 大規模言語モデルで変わる ML システム開発: ML system development that changes with large-scale language models [Mar 2023]
- GPT-4 登場以降に出てきた ChatGPT/LLM に関する論文や技術の振り返り: Review of ChatGPT/LLM papers and technologies that have emerged since the advent of GPT-4 [Jun 2023]
- LLM を制御するには何をするべきか?: How to control LLM [Jun 2023]
- 1. 生成 AI のマルチモーダルモデルでできること: What can be done with multimodal models of generative AI 2. 生成 AI のマルチモーダリティに関する技術調査 [Jun 2023]
- LLM の推論を効率化する量子化技術調査: Survey of quantization techniques to improve efficiency of LLM reasoning [Sep 2023]
- LLM の出力制御や新モデルについて: About LLM output control and new models [Sep 2023]
- Azure OpenAI を活用したアプリケーション実装のリファレンス: 日本マイクロソフト リファレンスアーキテクチャ [Jun 2023]
- 生成 AI・LLM のツール拡張に関する論文の動向調査: Survey of trends in papers on tool extensions for generative AI and LLM [Sep 2023]
- LLM の学習・推論の効率化・高速化に関する技術調査: Technical survey on improving the efficiency and speed of LLM learning and inference [Sep 2023]
- 日本語LLMまとめ - Overview of Japanese LLMs: 一般公開されている日本語LLM(日本語を中心に学習されたLLM)および日本語LLM評価ベンチマークに関する情報をまとめ [Jul 2023]
- Azure OpenAI Service で始める ChatGPT/LLM システム構築入門: サンプルプログラム [Aug 2023]
- Matsuo Lab: 人工知能・深層学習を学ぶためのロードマップ ref / doc [Dec 2023]
- AI事業者ガイドライン [Apr 2024]
- LLMにまつわる"評価"を整理する [06 Jun 2024]
- コード生成を伴う LLM エージェント [18 Jul 2024]
- Machine Learning Study 혼자 해보기 [Sep 2018]
- LangChain 한국어 튜토리얼 [Feb 2024]
- AI 데이터 분석가 ‘물어보새’ 등장 – RAG와 Text-To-SQL 활용 [Jul 2024]
- LLM, 더 저렴하게, 더 빠르게, 더 똑똑하게 [09 Sep 2024]
- Attention Is All You Need: [cnt]: 🏆 The Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. [12 Jun 2017] Illustrated transformer
- Must read: the 100 most cited AI papers in 2022 : doc [8 Mar 2023]
- The Best Machine Learning Resources : doc [20 Aug 2017]
- What are the most influential current AI Papers?: NLLG Quarterly arXiv Report 06/23 git [31 Jul 2023]
- OpenAI Cookbook Examples and guides for using the OpenAI API
- gpt4free for educational purposes only [Mar 2023]
- Comparing Adobe Firefly, Dalle-2, OpenJourney, Stable Diffusion, and Midjourney: Generative AI for images [20 Jun 2023]
- Open Problem and Limitation of RLHF: [cnt]: Provides an overview of open problems and the limitations of RLHF [27 Jul 2023]
- IbrahimSobh/llms: Language models introduction with simple code. [Jun 2023]
- DeepLearning.ai Short courses: DeepLearning.ai Short courses [2023]
-
DAIR.AI: Machine learning & NLP research (omarsar github)
- ML Papers of The Week [Jan 2023]
- Deep Learning cheatsheets for Stanford's CS 230: Super VIP Cheetsheet: Deep Learning [Nov 2019]
- LLM Visualization: A 3D animated visualization of an LLM with a walkthrough
- Best-of Machine Learning with Python:🏆A ranked list of awesome machine learning Python libraries. [Nov 2020]
- Large Language Models: Application through Production: A course on edX & Databricks Academy
- Large Language Model Course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks. [Jun 2023]
- CNN Explainer: Learning Convolutional Neural Networks with Interactive Visualization [Apr 2020]
- Foundational concepts like Transformers, Attention, and Vector Database [Feb 2024]
- LLM FineTuning Projects and notes on common practical techniques [Oct 2023]
- But what is a GPT?🏆3blue1brown: Visual intro to transformers [Apr 2024]
- Daily Dose of Data Science [Dec 2022]
- Machine learning algorithms: ml algorithms or implementation from scratch [Oct 2016]
- 900 most popular open source AI tools:🏆What I learned from looking at 900 most popular open source AI tools list [Mar 2024]
- Open100: Top 100 Open Source achievements.
- Awesome LLM Apps: A curated collection of awesome LLM apps built with RAG and AI agents. [Apr 2024]
- GenAI Agents:🏆Tutorials and implementations for various Generative AI Agent techniques, from basic to advanced. [Sep 2024]
- LLM Training/Build
- Pytorch: PyTorch is the most favorite library among researchers. Papers with code Trends [Sep 2016]
- huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. (github.com)
- jax: JAX is Autograd (automatically differentiate native Python & Numpy) and XLA (compile and run NumPy)
- fairseq: a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling [Sep 2017]
-
Weights & Biases: Visualizing and tracking your machine learning experiments wandb.ai doc:
deeplearning.ai/wandb
[Jan 2020] - mosaicml/llm-foundry: LLM training code for MosaicML foundation models [Jun 2022]
- string2string: The library is an open-source tool that offers a comprehensive suite of efficient algorithms for a broad range of string-to-string problems. string2string [Mar 2023]
- Sentence Transformers: Python framework for state-of-the-art sentence, text and image embeddings. Useful for semantic textual similar, semantic search, or paraphrase mining. git [27 Aug 2019]
- fastText: A library for efficient learning of word representations and sentence classification [Aug 2016 ]
- GPT4All: Open-source large language models that run locally on your CPU [Mar 2023]
- ollama: Running with Large language models locally [Jun 2023]
- unsloth: Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory! QLoRA & LoRA finetuning [Nov 2023]
- LLaMA-Factory: Unify Efficient Fine-Tuning of 100+ LLMs [May 2023]
- Visual Blocks: Google visual programming framework that lets you create ML pipelines in a no-code graph editor. [Mar 2023]
- LM Studio: UI for Discover, download, and run local LLMs [2023]
- YaFSDP: Yet another Fully Sharded Data Parallel (FSDP): enhanced for distributed training. YaFSDP vs DeepSpeed. [May 2024]
- vLLM: Easy-to-use library for LLM inference and serving. [Feb 2023]
- litellm: Python SDK to call 100+ LLM APIs in OpenAI format [Jul 2023]
- exo: Run your own AI cluster at home with everyday devices [Jun 2024]
- LLM Application
- BIG-AGI FKA nextjs-chatgpt-app [Mar 2023]
- GPT Researcher: Autonomous agent designed for comprehensive online research [Jul 2023] / GPT Newspaper: Autonomous agent designed to create personalized newspapers [Jan 2024]
- notesGPT: Record voice notes & transcribe, summarize, and get tasks [Nov 2023]
- screenshot-to-code: Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue) [Nov 2023]
- pyspark-ai: English instructions and compile them into PySpark objects like DataFrames. [Apr 2023]
- LlamaFS: Automatically renames and organizes your files based on their contents [May 2024]
- code2prompt: a command-line tool (CLI) that converts your codebase into a single LLM prompt with a source tree [Mar 2024]
- vanna: Chat with your SQL database [May 2023]
- Mem0: A self-improving memory layer for personalized AI experiences. [Jun 2023]
- PDF2Audio: an open-source alternative to NotebookLM for podcast creation [Sep 2024]
- Llama Stack: building blocks for Large Language Model (LLM) development [Jun 2024]
- RAG: X-ref
- UI/UX
- Gradio: Build Machine Learning Web Apps - in Python [Mar 2023]
- Text generation web UI: Text generation web UI [Mar 2023]
- Open AI Chat Mockup: An open source ChatGPT UI. mckaywrigley/chatbot-ui [Mar 2023]
- chainlit: Build production-ready Conversational AI applications in minutes. [Mar 2023]
- CopilotKit: Built-in React UI components [Jun 2023]
- Open-source GPT Wrappers 1. ChatGPT-Next-Web 2. FastGPT 3. Lobe Chat [Jan 2024]
- anything-llm: All-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more. [Jun 2023]
- langfun: leverages PyGlove to integrate LLMs and programming. [Aug 2023]
- Data Processing and Management
- PostgresML: The GPU-powered AI application database. [Apr 2022]
- Azure AI Document Intelligence (FKA. Azure Form Recognizer): ref: Table and Meta data Extraction in the Document
- Table to Markdown: LLM can recognize Markdown-formatted tables more effectively than raw table formats.
- Instructor: Structured outputs for LLMs, easily map LLM outputs to structured data. [Jun 2023]
- unstructured: Open-Source Pre-Processing Tools for Unstructured Data [Sep 2022]
- Math formula OCR: MathPix, OSS LaTeX-OCR [Jan 2021]
- Nougat: Neural Optical Understanding for Academic Documents: The academic document PDF parser that understands LaTeX math and tables. git [25 Aug 2023]
- activeloopai/deeplake: AI Vector Database for LLMs/LangChain. Doubles as a Data Lake for Deep Learning. Store, query, version, & visualize any data. Stream data in real-time to PyTorch/TensorFlow. ref [Jun 2021]
- Camelot a Python library that can help you extract tables from PDFs! ref: Comparison with other PDF Table Extraction libraries [Jul 2016]
- Marker: converts PDF to markdown [Oct 2023]
- firecrawl: Scrap entire websites into LLM-ready markdown or structured data. [Apr 2024]
- Trafilatura: Gather text from the web and convert raw HTML into structured, meaningful data. [Apr 2019]
- Crawl4AI: Open-source LLM Friendly Web Crawler & Scrapper [May 2024]
- Tools, Plugins, Development Tools, and Use Cases
- Streaming with Azure OpenAI SSE [May 2023]
- Opencopilot: Build and embed open-source AI Copilots into your product with ease. [Aug 2023]
- Azure OpenAI Proxy: OpenAI API requests converting into Azure OpenAI API requests [Mar 2023]
- Generative AI Design Patterns: A Comprehensive Guide: 9 architecture patterns for working with LLMs. [Feb 2024]
-
TaxyAI/browser-extension: Browser Automation by Chrome debugger API and Prompt >
src/helpers/determineNextAction.ts
[Mar 2023] - Spring AI: Developing AI applications for Java. [Jul 2023]
- Tiktoken Alternative in C#: microsoft/Tokenizer: .NET and Typescript implementation of BPE tokenizer for OpenAI LLMs. [Mar 2023]
- openai/shap-e Generate 3D objects conditioned on text or images [3 May 2023] git
- Drag Your GAN: [cnt]: Interactive Point-based Manipulation on the Generative Image Manifold git [18 May 2023]
- Embedding does not use Open AI. Can be executed locally: pdfGPT [Mar 2023]
- MemGPT: Virtual context management to extend the limited context window of LLM. A tiered memory system and a set of functions that allow it to manage its own memory. ref [12 Oct 2023]
- Very Simple LangChain example using Open AI: langchain-ask-pdf [Apr 2023]
- marvin: a lightweight AI toolkit for building natural language interfaces. [Mar 2023]
- langfuse: Traces, evals, prompt management and metrics to debug and improve your LLM application. [May 2023]
- mindsdb: The open-source virtual database for building AI from enterprise data. It supports SQL syntax for development and deployment, with over 70 technology and data integrations. [Aug 2018]
- Agent Applications & LLMOps
- The Rise and Potential of Large Language Model Based Agents: A Survey: The papers list for LLM-based agents [cnt] / git [14 Sep 2023]
- AgentBench Evaluating LLMs as Agents: Assess LLM-as Agent’s reasoning and decision-making abilities. [7 Aug 2023]
- Agentic Design Patterns ref [Mar 2024]
- Reflection: LLM self-evaluates to improve.
- Self-Refine [30 Mar 2023]
- Reflexion [20 Mar 2023 ]
- CRITIC [19 May 2023]
- Tool use: LLM uses tools for information gathering, action, or data processing.
- Gorilla [24 May 2023]
- MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action [20 Mar 2023]
- Efficient Tool Use with Chain-of-Abstraction Reasoning [30 Jan 2024]
- Planning: LLM devises and executes multistep plans to reach goals.
- Multi-agent collaboration: Multiple AI agents collaborate for better solutions.
- Communicative Agents for Software Development [16 Jul 2023]
- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation [16 Aug 2023]
- MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework [1 Aug 2023]
- Framework: Autogen / LangGraph / crewAI
- Reflection: LLM self-evaluates to improve.
- Generate the code ref [Jun 2024]
-
AI Agents That Matter: AI agent evaluations for optimizing both accuracy and cost. Focusing solely on accuracy can lead to overfitting and high costs.
retry, warming, escalation
[1 Jul 2024] - Generative AI Design Patterns for Agentic AI Systems: Design Patterns for Agentic solutions in Azure [May 2023]
- Automated Design of Agentic Systems: Automated Design of Agentic Systems (ADAS), which aims to automatically create powerful agentic system designs, including inventing novel building blocks and/or combining them. [15 Aug 2024]
-
Berkeley Function-Calling Leaderboard V2 [Aug 2024]
-
Gorilla: An API store for LLMs: [cnt]: Gorilla: Large Language Model Connected with Massive APIs git [24 May 2023]
- Used GPT-4 to generate a dataset of instruction-api pairs for fine-tuning Gorilla.
- Used the abstract syntax tree (AST) of the generated code to match with APIs in the database and test set for evaluation purposes.
Another user asked how Gorilla compared to LangChain; Patil replied: LangChain is a terrific project that tries to teach agents how to use tools using prompting. Our take on this is that prompting is not scalable if you want to pick between 1000s of APIs. So Gorilla is a LLM that can pick and write the semantically and syntactically correct API for you to call! A drop in replacement into LangChain! cite [04 Jul 2023]
-
Meta: Toolformer: [cnt]: Language Models That Can Use Tools, by MetaAI git [9 Feb 2023]
-
ToolLLM: [cnt]: : Facilitating Large Language Models to Master 16000+ Real-world APIs git [31 Jul 2023]
-
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets [26 Jun 2024]
- Agent Framework
- Open AI Assistant
- Autogen: Customizable and conversable agents framework
- MetaGPT: Multi-Agent Framework. Assign different roles to GPTs to form a collaborative entity for complex tasks. e.g., Data Interpreter [Jun 2023]
- crewAI: Framework for orchestrating role-playing, autonomous AI agents. [Oct 2023]
- LangGraph: Built on top of LangChain
- composio: Integration of Agents with 100+ Tools [Feb 2024]
- phidata: Build AI Assistants with memory, knowledge and tools [May 2022]
- Qwen-Agent: Agent framework built upon Qwen1.5, featuring Function Calling, Code Interpreter, RAG, and Chrome extension. Qwen series released by Alibaba Group [Sep 2023]
- OpenAgents: three distinct agents: Data Agent for data analysis, Plugins Agent for plugin integration, and Web Agent for autonomous web browsing. [Aug 2023]
- maestro: A Framework for Claude Opus, GPT and local LLMs to Orchestrate Subagents [Mar 2024]
- Microsoft Agent Frameworks X-ref
- Agent Application
- Auto-GPT: Most popular [Mar 2023]
- babyagi: Most simplest implementation - Coworking of 4 agents [Apr 2023]
- SuperAGI: GUI for agent settings [May 2023]
- lightaime/camel: 🐫 CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society [Mar 2023] / 1:1 Conversation between two ai agents Hugging Face (camel-agents)
- ChatDev: Virtual software company. Create Customized Software using LLM-powered Multi-Agent Collaboration [Sep 2023]
- GPT Pilot: The first real AI developer. Dev tool that writes scalable apps from scratch while the developer oversees the implementation [Jul 2023]
- SeeAct: GPT-4V(ision) is a Generalist Web Agent, if Grounded [Jan 2024]
- skyvern: Automate browser-based workflows with LLMs and Computer Vision [Feb 2024]
- LaVague: Automate automation with Large Action Model framework. Generate Selenium code. [Feb 2024]
- Project Astra: Google DeepMind, A universal AI agent that is helpful in everyday life [14 May 2024]
- KHOJ: Open-source, personal AI agents. Cloud or Self-Host, Multiple Interfaces. Python Django based [Aug 2021]
- PR-Agent: Efficient code review and handle pull requests, by providing AI feedbacks and suggestions [Jan 2023]
- SakanaAI AI-Scientist: Towards Fully Automated Open-Ended Scientific Discovery [Aug 2024]
- aider: AI pair programming in your terminal [Jan 2023]
- Zed: AI code editor from the creators of Atom and Tree-sitter [Sep 2024]
- Proprietary Software: AI Code Editor: Replit Agent [09 Sep 2024] / Cursor [Mar 2023]
-
OpenAI Code Interpreter Integration with Sandboxed python execution environment [23 Mar 2023]
- We provide our models with a working Python interpreter in a sandboxed, firewalled execution environment, along with some ephemeral disk space.
- OSS Code Interpreter A LangChain implementation of the ChatGPT Code Interpreter. [Jul 2023]
- gpt-code-ui An open source implementation of OpenAI's ChatGPT Code interpreter. [May 2023]
- Open Interpreter: Let language models run code on your computer. [Jul 2023]
- SlashGPT The tool integrated with "jupyter" agent [Apr 2023]
- Caching: A technique to store data that has been previously retrieved or computed, so that future requests for the same data can be served faster.
- To reduce latency, cost, and LLM requests by serving pre-computed or previously served responses.
- Strategies for caching: Caching can be based on item IDs, pairs of item IDs, constrained input, or pre-computation. Caching can also leverage embedding-based retrieval, approximate nearest neighbor search, and LLM-based evaluation. ref
- GPTCache: Semantic cache for LLMs. Fully integrated with LangChain and llama_index. git [Mar 2023]
- Prompt caching with Claude: Reducing costs by up to 90% and latency by up to 85% for long prompts. [15 Aug 2024]
- Defensive UX: A design strategy that aims to prevent and handle errors in user interactions with machine learning or LLM-based products.
- Why defensive UX?: Machine learning and LLMs can produce inaccurate or inconsistent output, which can affect user trust and satisfaction. Defensive UX can help by increasing accessibility, trust, and UX quality.
- Guidelines for Human-AI Interaction: Microsoft: Based on a survey of 168 potential guidelines from various sources, they narrowed it down to 18 action rules organized by user interaction stages.
- People + AI Guidebook: Google: Google’s product teams and academic research, they provide 23 patterns grouped by common questions during the product development process3.
- Human Interface Guidelines for Machine Learning: Apple: Based on practitioner knowledge and experience, emphasizing aspects of UI rather than model functionality4.
- PromptCraft-Robotics: Robotics and a robot simulator with ChatGPT integration git [Feb 2023]
- ChatGPT-Robot-Manipulation-Prompts: A set of prompts for Communication between humans and robots for executing tasks. git [Apr 2023]
- Siemens Industrial Copilot ref [31 Oct 2023]
- Mobile ALOHA: Stanford’s mobile ALOHA robot learns from humans to cook, clean, do laundry. Mobile ALOHA extends the original ALOHA system by mounting it on a wheeled base ref [4 Jan 2024] / ALOHA: A Low-cost Open-source Hardware System for Bimanual Teleoperation.
- Figure 01 + OpenAI: Humanoid Robots Powered by OpenAI ChatGPT youtube [Mar 2024]
- LeRobot: Hugging Face. LeRobot aims to provide models, datasets, and tools for real-world robotics in PyTorch. git [Jan 2024]
- FRVR Official Teaser: Prompt to Game: AI-powered end-to-end game creation [16 Jun 2023]
- rewind.ai: Rewind captures everything you’ve seen on your Mac and iPhone [Nov 2023]
- Mobile ALOHA: A day of Mobile ALOHA [4 Jan 2024]
- groq: An LPU Inference Engine, the LPU is reported to be 10 times faster than NVIDIA’s GPU performance ref [Jan 2024]
- Sora: Introducing Sora — OpenAI’s text-to-video model [Feb 2024]
- Vercel announced V0.dev: Make a snake game with chat [Oct 2023]
- The leader: http://openai.com
- The runner-up: http://bard.google.com -> https://gemini.google.com
- Open source: http://huggingface.co/chat
- Searching web: http://perplexity.ai
- Content writing: http://jasper.ai/chat / cite
- Oceans of AI - All AI Tools https://play.google.com/store/apps/details?id=in.blueplanetapps.oceansofai&hl=en_US
- Newsletters & Tool Databas: https://www.therundown.ai/
- allAIstartups: https://www.allaistartups.com/ai-tools
- Future Tools: https://www.futuretools.io/
- AI Tools: https://aitoolmall.com/
- Edge and Chrome Extension & Plugin
- MaxAI.me
- BetterChatGPT
- ChatHub All-in-one chatbot client Webpage
- ChatGPT Retrieval Plugin
- Vercel AI Vercel AI Playground / Vercel AI SDK git [May 2023]
- Quora Poe A chatbot service that gives access to GPT-4, gpt-3.5-turbo, Claude from Anthropic, and a variety of other bots. [Feb 2023]
- Product Hunt > AI
- LLM-generated datasets:
- Self-Instruct: [cnt]: Seed task pool with a set of human-written instructions. [20 Dec 2022]
- Self-Alignment with Instruction Backtranslation: [cnt]: Without human seeding, use LLM to produce instruction-response pairs. The process involves two steps: self-augmentation and self-curation. [11 Aug 2023]
- LLMDataHub: Awesome Datasets for LLM Training: A quick guide (especially) for trending instruction finetuning datasets
- Open LLMs and Datasets: A list of open LLMs available for commercial use.
- SQuAD: The Stanford Question Answering Dataset (SQuAD), a set of Wikipedia articles, 100,000+ question-answer pairs on 500+ articles. [16 Jun 2016]
- RedPajama: LLaMA training dataset of over 1.2 trillion tokens git [17 Apr 2023]
- FineWeb: HuggingFace: crawled 15 trillion tokens of high-quality web data from the summer of 2013 to March 2024. [Apr 2024]
- MS MARCO Web Search: A large-scale information-rich web dataset, featuring millions of real clicked query-document labels [Apr 2024]
Pretrain for a base model
{
"text": ...,
"meta": {"url": "...", "timestamp": "...", "source": "...", "language": "...", ...},
"red_pajama_subset": "common_crawl" | "c4" | "github" | "books" | "arxiv" | "wikipedia" | "stackexchange"
}
databricks-dolly-15k: Instruction-Tuned git: SFT training - QA pairs or Dialog
{
"prompt": "What is the capital of France?",
"response": "The capital of France is Paris."
},
{
"prompt": "Can you give me a recipe for chocolate chip cookies?",
"response": "Sure! ..."
}
Anthropic human-feedback: RLHF training - Chosen and Rejected pairs
{
"chosen": "I'm sorry to hear that. Is there anything I can do to help?",
"rejected": "That's too bad. You should just get over it."
}
-
大規模言語モデルのデータセットまとめ: 大規模言語モデルのデータセットまとめ [Apr 2023]
-
Dataset example
Expand
Category Instruction Context Response 0 Open QA How do I get rid of mosquitos in my house? You can get rid of mosquitos in your house by ... 1 Classification Classify each country as "African" or "European" Nigeria: African
Rwanda: African
Portugal: European2 Information Extraction Extract the unique names of composers from the text. To some extent, European and the US traditions... Pierre Boulez, Luigi Nono, Karlheinz Stockhausen 3 General QA Should investors time the market? Timing the market is based on predictions of t... Instruction Chosen Response Rejected Response What is Depreciation Depreciation is the drop in value of an asset ... What is Depreciation – 10 Important Facts to K... What do you know about the city of Aberdeen in Scotland? Aberdeen is a city located in the North East of Scotland. It is known for its granite architecture and its offshore oil industry. As an AI language model, I don't have personal knowledge or experiences about Aberdeen. Describe thunderstorm season in the United States and Canada. Thunderstorm season in the United States and Canada typically occurs during the spring and summer months, when warm, moist air collides with cooler, drier air, creating the conditions for thunderstorms to form. Describe thunderstorm season in the United States and Canada.
- Awesome LLMs Evaluation Papers: Evaluating Large Language Models: A Comprehensive Survey git
- Evaluation of Large Language Models: A Survey on Evaluation of Large Language Models: [cnt] [6 Jul 2023]
- ChatGPT’s One-year Anniversary: Are Open-Source Large Language Models Catching up?: Open-Source LLMs vs. ChatGPT; Benchmarks and Performance of LLMs [28 Nov 2023]
- Evaluation Papers for ChatGPT [28 Feb 2023]
- MMLU (Massive Multi-task Language Understanding): LLM performance across 57 tasks including elementary mathematics, US history, computer science, law, and more. [7 Sep 2020]
- BIG-bench: Consists of 204 evaluations, contributed by over 450 authors, that span a range of topics from science to social reasoning. The bottom-up approach; anyone can submit an evaluation task. git [9 Jun 2022]
- HELM: Evaluation scenarios like reasoning and disinformation using standardized metrics like accuracy, calibration, robustness, and fairness. The top-down approach; experts curate and decide what tasks to evaluate models on. git [16 Nov 2022]
- HumanEval: Hand-Written Evaluation Set for Code Generation Bechmark. 164 Human written Programming Problems. ref / git [7 Jul 2021]
- Prometheus: Inducing Fine-grained Evaluation Capability in Language Models: [cnt]: We utilize the FEEDBACK COLLECTION, a novel dataset, to train PROMETHEUS, an open-source large language model with 13 billion parameters, designed specifically for evaluation tasks. [12 Oct 2023]
-
LLM Model Evals vs LLM Task Evals
:
Model Evals
are really for people who are building or fine-tuning an LLM. vs The best LLM application builders are usingTask evals
. It's a tool to help builders build. [Feb 2024] - LLMPerf Leaderboard: Evaulation the performance of LLM APIs. [Dec 2023]
- Artificial Analysis LLM Performance Leaderboard: Performance benchmarks & pricing across API providers of LLMs
- LLM-as-a-Judge: LLM-as-a-Judge offers a quick, cost-effective way to develop models aligned with human preferences and is easy to implement with just a prompt, but should be complemented by human evaluation to address biases. [Jul 2024]
- Can Large Language Models Be an Alternative to Human Evaluations? [3 May 2023]
- Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge): Key considerations and Use cases when using LLM-evaluators [Aug 2024]
- LightEval: a lightweight LLM evaluation suite that Hugging Face has been using internally [Jan 2024]
Expand
- MMLU (Massive Multitask Language Understanding): Over 15,000 questions across 57 diverse tasks. [Published in 2021]
- TruthfulQA: Truthfulness. [Published in 2022]
- BigBench: 204 tasks. Predicting future potential [Published in 2023]
- GLUE & SuperGLUE: GLUE (General Language Understanding Evaluation)
- HumanEval: Challenges coding skills. [Published in 2021]
- CodeXGLUE: Programming tasks.
- SWE-bench: Software Engineering Benchmark. Real-world software issues sourced from GitHub.
- MBPP: Mostly Basic Python Programming. [Published in 2021]
- Chatbot Arena: Human-ranked ELO ranking.
- MT Bench: Multi-turn open-ended questions - Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena [9 Jun 2023]
- HellaSwag: Commonsense reasoning. [Published in 2019]
- ARC (AI2 Reasoning Challenge): Measures general fluid intelligence.
- DROP: Evaluates discrete reasoning.
- LogicQA: Evaluates logical reasoning skills.
- WMT: Evaluates translation skills.
- Automated evaluation of LLMs
-
n-gram based metrics: Evaluates the model using n-gram statistics and F1 score. ROUGE, BLEU, and METEOR are used for summarization and translation tasks.
-
Probabilistic model evaluation metrics: Evaluates the model using the predictive performance of probability models. Perplexity.
-
Embedding based metrics: Evaluates the model using semantic similarity of embeddings. Ada Similarity and BERTScore are used.
Expand
-
ROUGE (Recall-Oriented Understudy for Gisting Evaluation): The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation. It includes several measures such as:
- ROUGE-N: Overlap of n-grams between the system and reference summaries.
- ROUGE-L: Longest Common Subsequence (LCS) based statistics.
- ROUGE-W: Weighted LCS-based statistics that favor consecutive LCSes.
- ROUGE-S: Skip-bigram based co-occurrence statistics.
- ROUGE-SU: Skip-bigram plus unigram-based co-occurrence statistics1.
-
n-gram: An n-gram is a contiguous sequence of n items from a given sample of text or speech. For example, in the sentence “I love AI”, the unigrams (1-gram) are “I”, “love”, “AI”; the bigrams (2-gram) are “I love”, “love AI”; and the trigram (3-gram) is “I love AI”.
-
BLEU: BLEU’s output is always a number between 0 and 1. An algorithm for evaluating the quality of machine-translated text. The closer a machine translation is to a professional human translation, the better it is.
-
BERTScore: A metric that leverages pre-trained contextual embeddings from BERT for text generation tasks. It combines precision and recall values.
-
Perplexity: A measure of a model's predictive performance, with lower values indicating better prediction.
-
METEOR: An n-gram based metric for machine translation, considering precision, recall, and semantic similarity.
-
-
Human evaluation of LLMs (possibly Automate by LLM-based metrics): Evaluate the model’s performance on NLU and NLG tasks. It includes evaluations of relevance, fluency, coherence, and groundedness.
-
Built-in evaluation methods in Prompt flow: ref [Aug 2023] / ref
- OpenAI Evals: A framework for evaluating large language models (LLMs) [Mar 2023]
- promptfoo: Test your prompts. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality. [Apr 2023]
- PromptTools: Open-source tools for prompt testing git [Jun 2023]
- TruLens: Instrumentation and evaluation tools for large language model (LLM) based applications. [Nov 2020]
- Pezzo: Open-source, developer-first LLMOps platform [May 2023]
- Giskard: The testing framework for ML models, from tabular to LLMs [Mar 2022]
- Azure Machine Learning studio Model Data Collector: Collect production data, analyze key safety and quality evaluation metrics on a recurring basis, receive timely alerts about critical issues, and visualize the results. ref
- Azure ML Prompt flow: A set of LLMOps tools designed to facilitate the creation of LLM-based AI applications [Sep 2023] > How to Evaluate & Upgrade Model Versions in the Azure OpenAI Service [14 Aug 2024]
- Ragas: Evaluation framework for your Retrieval Augmented Generation (RAG) [May 2023]
- DeepEval: LLM evaluation framework. similar to Pytest but specialized for unit testing LLM outputs. [Aug 2023]
- traceloop openllmetry: Quality monitoring for your LLM applications. [Sep 2023]
- Language Model Evaluation Harness: Over 60 standard academic benchmarks for LLMs. A framework for few-shot evaluation. [Aug 2020]
-
Pretraining on the Test Set Is All You Need: [cnt]
- On that note, in the satirical Pretraining on the Test Set Is All You Need paper, the author trains a small 1M parameter LLM that outperforms all other models, including the 1.3B phi-1.5 model. This is achieved by training the model on all downstream academic benchmarks. It appears to be a subtle criticism underlining how easily benchmarks can be "cheated" intentionally or unintentionally (due to data contamination). cite [13 Sep 2023]
- Challenges in evaluating AI systems: The challenges and limitations of various methods for evaluating AI systems, such as multiple-choice tests, human evaluations, red teaming, model-generated evaluations, and third-party audits. doc [4 Oct 2023]
- Your AI Product Needs Evals [29 Mar 2024] / How to Evaluate LLM Applications: The Complete Guide [7 Nov 2023]
ⓒ https://github.com/kimtth
all rights reserved.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for awesome-azure-openai-llm
Similar Open Source Tools
LMOps
LMOps is a research initiative focusing on fundamental research and technology for building AI products with foundation models, particularly enabling AI capabilities with Large Language Models (LLMs) and Generative AI models. The project explores various aspects such as prompt optimization, longer context handling, LLM alignment, acceleration of LLMs, LLM customization, and understanding in-context learning. It also includes tools like Promptist for automatic prompt optimization, Structured Prompting for efficient long-sequence prompts consumption, and X-Prompt for extensible prompts beyond natural language. Additionally, LLMA accelerators are developed to speed up LLM inference by referencing and copying text spans from documents. The project aims to advance technologies that facilitate prompting language models and enhance the performance of LLMs in various scenarios.
Awesome-Text2SQL
Awesome Text2SQL is a curated repository containing tutorials and resources for Large Language Models, Text2SQL, Text2DSL, Text2API, Text2Vis, and more. It provides guidelines on converting natural language questions into structured SQL queries, with a focus on NL2SQL. The repository includes information on various models, datasets, evaluation metrics, fine-tuning methods, libraries, and practice projects related to Text2SQL. It serves as a comprehensive resource for individuals interested in working with Text2SQL and related technologies.
awesome-deliberative-prompting
The 'awesome-deliberative-prompting' repository focuses on how to ask Large Language Models (LLMs) to produce reliable reasoning and make reason-responsive decisions through deliberative prompting. It includes success stories, prompting patterns and strategies, multi-agent deliberation, reflection and meta-cognition, text generation techniques, self-correction methods, reasoning analytics, limitations, failures, puzzles, datasets, tools, and other resources related to deliberative prompting. The repository provides a comprehensive overview of research, techniques, and tools for enhancing reasoning capabilities of LLMs.
lobe-chat
Lobe Chat is an open-source, modern-design ChatGPT/LLMs UI/Framework. Supports speech-synthesis, multi-modal, and extensible ([function call][docs-functionc-call]) plugin system. One-click **FREE** deployment of your private OpenAI ChatGPT/Claude/Gemini/Groq/Ollama chat application.
SoM-LLaVA
SoM-LLaVA is a new data source and learning paradigm for Multimodal LLMs, empowering open-source Multimodal LLMs with Set-of-Mark prompting and improved visual reasoning ability. The repository provides a new dataset that is complementary to existing training sources, enhancing multimodal LLMs with Set-of-Mark prompting and improved general capacity. By adding 30k SoM data to the visual instruction tuning stage of LLaVA, the tool achieves 1% to 6% relative improvements on all benchmarks. Users can train SoM-LLaVA via command line and utilize the implementation to annotate COCO images with SoM. Additionally, the tool can be loaded in Huggingface for further usage.
dom-to-semantic-markdown
DOM to Semantic Markdown is a tool that converts HTML DOM to Semantic Markdown for use in Large Language Models (LLMs). It maximizes semantic information, token efficiency, and preserves metadata to enhance LLMs' processing capabilities. The tool captures rich web content structure, including semantic tags, image metadata, table structures, and link destinations. It offers customizable conversion options and supports both browser and Node.js environments.
awesome-weather-models
A catalogue and categorization of AI-based weather forecasting models. This page provides a catalogue and categorization of AI-based weather forecasting models to enable discovery and comparison of different available model options. The weather models are categorized based on metadata found in the JSON schema specification. The table includes information such as the name of the weather model, the organization that developed it, operational data availability, open-source status, and links for further details.
LakeSoul
LakeSoul is a cloud-native Lakehouse framework that supports scalable metadata management, ACID transactions, efficient and flexible upsert operation, schema evolution, and unified streaming & batch processing. It supports multiple computing engines like Spark, Flink, Presto, and PyTorch, and computing modes such as batch, stream, MPP, and AI. LakeSoul scales metadata management and achieves ACID control by using PostgreSQL. It provides features like automatic compaction, table lifecycle maintenance, redundant data cleaning, and permission isolation for metadata.
DriveLM
DriveLM is a multimodal AI model that enables autonomous driving by combining computer vision and natural language processing. It is designed to understand and respond to complex driving scenarios using visual and textual information. DriveLM can perform various tasks related to driving, such as object detection, lane keeping, and decision-making. It is trained on a massive dataset of images and text, which allows it to learn the relationships between visual cues and driving actions. DriveLM is a powerful tool that can help to improve the safety and efficiency of autonomous vehicles.
esp-ai
ESP-AI provides a complete AI conversation solution for your development board, including IAT+LLM+TTS integration solutions for ESP32 series development boards. It can be injected into projects without affecting existing ones. By providing keys from platforms like iFlytek, Jiling, and local services, you can run the services without worrying about interactions between services or between development boards and services. The project's server-side code is based on Node.js, and the hardware code is based on Arduino IDE.
lance
Lance is a modern columnar data format optimized for ML workflows and datasets. It offers high-performance random access, vector search, zero-copy automatic versioning, and ecosystem integrations with Apache Arrow, Pandas, Polars, and DuckDB. Lance is designed to address the challenges of the ML development cycle, providing a unified data format for collection, exploration, analytics, feature engineering, training, evaluation, deployment, and monitoring. It aims to reduce data silos and streamline the ML development process.
DB-GPT
DB-GPT is a personal database administrator that can solve database problems by reading documents, using various tools, and writing analysis reports. It is currently undergoing an upgrade. **Features:** * **Online Demo:** * Import documents into the knowledge base * Utilize the knowledge base for well-founded Q&A and diagnosis analysis of abnormal alarms * Send feedbacks to refine the intermediate diagnosis results * Edit the diagnosis result * Browse all historical diagnosis results, used metrics, and detailed diagnosis processes * **Language Support:** * English (default) * Chinese (add "language: zh" in config.yaml) * **New Frontend:** * Knowledgebase + Chat Q&A + Diagnosis + Report Replay * **Extreme Speed Version for localized llms:** * 4-bit quantized LLM (reducing inference time by 1/3) * vllm for fast inference (qwen) * Tiny LLM * **Multi-path extraction of document knowledge:** * Vector database (ChromaDB) * RESTful Search Engine (Elasticsearch) * **Expert prompt generation using document knowledge** * **Upgrade the LLM-based diagnosis mechanism:** * Task Dispatching -> Concurrent Diagnosis -> Cross Review -> Report Generation * Synchronous Concurrency Mechanism during LLM inference * **Support monitoring and optimization tools in multiple levels:** * Monitoring metrics (Prometheus) * Flame graph in code level * Diagnosis knowledge retrieval (dbmind) * Logical query transformations (Calcite) * Index optimization algorithms (for PostgreSQL) * Physical operator hints (for PostgreSQL) * Backup and Point-in-time Recovery (Pigsty) * **Continuously updated papers and experimental reports** This project is constantly evolving with new features. Don't forget to star ⭐ and watch 👀 to stay up to date.
MLE-agent
MLE-Agent is an intelligent companion designed for machine learning engineers and researchers. It features autonomous baseline creation, integration with Arxiv and Papers with Code, smart debugging, file system organization, comprehensive tools integration, and an interactive CLI chat interface for seamless AI engineering and research workflows.
SLAM-LLM
SLAM-LLM is a deep learning toolkit for training custom multimodal large language models (MLLM) focusing on speech, language, audio, and music processing. It provides detailed recipes for training and high-performance checkpoints for inference. The toolkit supports various tasks such as automatic speech recognition (ASR), text-to-speech (TTS), visual speech recognition (VSR), automated audio captioning (AAC), spatial audio understanding, and music caption (MC). Users can easily extend to new models and tasks, utilize mixed precision training for faster training with less GPU memory, and perform multi-GPU training with data and model parallelism. Configuration is flexible based on Hydra and dataclass, allowing different configuration methods.