ai_summer

Summary repository for AI Summer 2024

Stars: 59

Visit

AI Summer is a repository focused on providing workshops and resources for developing foundational skills in generative AI models and transformer models. The repository offers practical applications for inferencing and training, with a specific emphasis on understanding and utilizing advanced AI chat models like BingGPT. Participants are encouraged to engage in interactive programming environments, decide on projects to work on, and actively participate in discussions and breakout rooms. The workshops cover topics such as generative AI models, retrieval-augmented generation, building AI solutions, and fine-tuning models. The goal is to equip individuals with the necessary skills to work with AI technologies effectively and securely, both locally and in the cloud.

README:

AI Summer

Summary repository for AI Summer 2024. Introduction to generative AI, with practical applications to inferencing and training

Presented by Vanderbilt Data Science Institute data scientists:

Dr. Jesse Spencer-Smith, Chief Data Scientist
Dr. Charreau Bell, Senior Data Scientist
Myranda Shirk, Senior Data Scientist
Umang Chaudhry, Data Scientist
Dr. Abigail Petulante, DSI Postdoctoral Fellow
Dr. Joshua Su, DSI Postdoctoral Fellow

Overview

The objective of these workshops is to develop foundational skills in understanding, inferencing and training generative AI models and other transformer models.

Getting Ready for AI Summer

Learn or Improve Your Python

Practice your Python skills using the below documents. Choose either a Google Colab for interactive programming environment, or alternatively read through the Google Doc.

Getting Accounts

You’ll want to use the most advanced AI chat model that you can get access to. Microsoft just opened access to BingGPT through Bing Chat, which is based on an early version of GPT4, currently the most advanced AI chat model available to the public. You’ll need to install the Edge browser (https://www.microsoft.com › edge › download) and go to bing. com. Click on “Chat”.

Decide on a Project (recommended)

Think about any data you might want to bring to the workshop. Also begin thinking about any projects you might want to accomplish during our month. We’ll have office hours for you to work with us to get your first project off the ground!

Workshop Schedule

Session will run live from 9am-11am, with an office hour from 11am to noon (all times Central).

Week 1, 5/6 - 5/8: Introduction to Generative AI Models, Transformers, and Prompt Engineering

Monday: https://vanderbilt.zoom.us/rec/share/17V0GyZJoD9PPmenE2stE9AAPGgT0x8KQcEis-VHMn4n8GdsUrjbmu61Tb8r4iv3.FZdNC_03OR0tiDSI?startTime=1715004136000

Wednesday: https://vanderbilt.zoom.us/rec/share/jmUNeAlIF-vBLDWmFuV_HWlI8c944oM3YB94Q_9H36pgKtwTvQlzQAyGASEM3gaw.OKGh4pXYucHTLKAv

No class Friday (Vanderbilt Commencement)

Weeks 2, 5/13 - 5/17: Retrieval-Augmented Generation (RAG), Assistants, Agents, and Intro to Diffusion Models

Monday Part 1: https://vanderbilt.zoom.us/rec/share/DFJ_bDybROtlo0qUseTFRo8QvntugqIsAlSVud1WKoEVlnO17ksxpz7uPgGgYN1r.1o_oPGd_07POUKux?startTime=1715608503000

Monday Part 2: https://vanderbilt.zoom.us/rec/share/DFJ_bDybROtlo0qUseTFRo8QvntugqIsAlSVud1WKoEVlnO17ksxpz7uPgGgYN1r.1o_oPGd_07POUKux?startTime=1715613842000

Wednesday: https://vanderbilt.zoom.us/rec/share/D_asoLsFb3tHZ4AB6qHaODUJvnwNyGQA6bQsG4pSMqvI1zQ-cQkNXsxcuPi1wqYv.vcKQy17Jc8Y0xGbM?startTime=1715781305000

Friday: https://vanderbilt.zoom.us/rec/share/y8paBkSmKLcUJRnHtwo8ZYSmNWCCG8nX9mtxfaOWBq_HbGSdMyq_VLi_E6wQZG2l.09Mt6t9VgpsR_ruj?startTime=1715954101000

Week 3, 5/20 - 5/24: Building AI Solutions, Running AI Securely Locally or in the Cloud, Introduction to Training Models

Monday:

Recording: https://vanderbilt.zoom.us/rec/share/LGj-kG8kW22LzCacO6P2GdAtHRL0Tq3cn0O2NV36TlNQzeQKHDuZJkVqHraKcqaR.ScCzM5Gqp82li_6q
Homework: Watch the following videos: General Backprop and (math-centric backprop](https://youtu.be/tIeHLnjs5U8?si=mnT36GTL7YqU8qBO)

Wednesday: Recording: (https://vanderbilt.zoom.us/rec/share/fswTlpFMlqAVgxRDDBza920i9brAuxaSiteHpDNUwpm9YQzedJa5g_2oZSSr2Eq1.wF73yKYGD5eY3cyY?startTime=1716392393000)

Friday: Recording: (https://vanderbilt.zoom.us/rec/share/plozihJcLFBIfjPxQ8Bsv9IdqHh39qFinkVUChsYtuiuiGAc8O2TcvTEbTE5cAUW.3XYBPJfbdZJ1GzAS?startTime=1716558902000)

Week 4, 5/29 - 5/31: Fine-tuning Models, Mulit-modal Models, Applied AI

No class Monday (Memorial Day)

Wednesday:

Papers/Blogs discussed:

https://arxiv.org/pdf/2405.17247

https://proceedings.mlr.press/v139/radford21a/radford21a.pdf

https://openaccess.thecvf.com/content/CVPR2022/papers/Singh_FLAVA_A_Foundational_Language_and_Vision_Alignment_Model_CVPR_2022_paper.pdf

https://arxiv.org/pdf/2405.09818

https://sh-tsang.medium.com/review-flamingo-a-visual-language-model-for-few-shot-learning-ec477d47e7bf

https://arxiv.org/pdf/2304.10592

https://arxiv.org/pdf/2310.03744

https://huggingface.co/papers/2311.05437

https://arxiv.org/pdf/2311.05437

https://llava-vl.github.io/blog/2024-01-30-llava-next/

https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/

https://llava-vl.github.io/blog/2024-04-30-llava-next-video/

https://arxiv.org/abs/2310.02239

https://wandb.ai/byyoung3/ml-news/reports/How-to-Fine-Tune-LLaVA-on-a-Custom-Dataset--Vmlldzo2NjUwNTc1

Breakout Rooms

How to Breakout

Remember we are all learning and exploring

Please share your video upon entering the room and unmute
Share your screens--someone volunteer to share their screen upon entering, and everyone be ready to share your screen to show what you’ve found
Make notes of what you’ve discussed in the Response Reports below
Everyone be ready to report out (random)
Make some friends
Breakout Rooms Worksheets

Report Documents

Google Docs has a limit of 100 people viewing/editing a document at one time.

Special Breakout Room Groups

Please be sure your display name is set in Zoom. If you are in one of the following special groups, please pre-pend your name with one of the following qualifiers.

Data Science for Social Good: DSSG
Center for AI in Protein Dynamics: Protein
If you are in a lab and would like your own breakout room: Labname (keep it short, please!)
If you are faculty and would like to be in a breakout room with other faculty: Faculty

For example, I might be DSSG-Jesse Spencer-Smith

Workshop Video Recordings

Video recordings of these workshops can be found on our YouTube channel AI Summer playlist

Looking for the code resources for Summer 2023? View the 2023 repo version here.

Course Resources

Prompt Engineering paper https://arxiv.org/abs/2302.11382
Prompt Engineering Courserea Course: https://www.coursera.org/learn/prompt-engineering
Visual overview of Generative AI from 3Blue1Brown: https://www.youtube.com/watch?v=wjZofJX0v4M
Semester-long course on transformer models, DS 5690. Graduate students and advanced undergraduates can register by contacting me. I welcome auditing by a select number of postdoctoral fellows, and drop-ins from faculty!

Other Resources

Compute Grants for Vanderbilt Faculty and Students

DGX A100 Compute Grant: https://forms.gle/2mGfEy9DB4JU2GpZ8

Python

Transformers

Natural Language Processing with Transformers by Lewis Tunstall, Leandro von Werra and Thomas Wolf. If you are affiliated with Vanderbilt University, you can access this pre-print book (and any book by O’Reilly) free by logging into O'Reilly Media using your Vanderbilt email address. Vanderbilt licenses all content from O’Reilly. The book covers Transformers for purposes beyond text.

Getting the Most out of this Course

To get the most out of this workshop:

Open Colab (workbook) notebooks and actively write code along with the instructor
Actively participate in discussions
Actively participate in breakout rooms
Work on homework assignments before coming to class
Relax your mind and ask questions

Accessing Bing Chat

Open the Edge browser (yes, Edge) and navigate to www.bing.com
Select "chat". A new window should open saying you need the new Bing.
Select "Start chatting" at the bottom of this window. This should prompt you to sign in to a Microsoft account. Do not use an organizational/school email (such as Vanderbilt). Instead, select "No account? Create a new one" and create one with your personal email. Note: if you get stuck in the "use the new Bing" window, go back to Bing.com and select "Sign in" instead. Follow instructions for Step 3.

Sign in to your personal Microsoft account.
Select "chat" - it should work now!

For Tasks:

Click tags to check more tools for each tasks

develop ai solutions fine-tune models work on projects engage in discussions participate in workshops

For Jobs:

data scientist machine learning engineer ai researcher ai consultant research scientist

Alternative AI tools for ai_summer

Similar Open Source Tools

ai_summer

github

: 59

intro-to-intelligent-apps

This repository introduces and helps organizations get started with building AI Apps and incorporating Large Language Models (LLMs) into them. The workshop covers topics such as prompt engineering, AI orchestration, and deploying AI apps. Participants will learn how to use Azure OpenAI, Langchain/ Semantic Kernel, Qdrant, and Azure AI Search to build intelligent applications.

github

: 146

llmops-duke-aipi

LLMOps Duke AIPI is a course focused on operationalizing Large Language Models, teaching methodologies for developing applications using software development best practices with large language models. The course covers various topics such as generative AI concepts, setting up development environments, interacting with large language models, using local large language models, applied solutions with LLMs, extensibility using plugins and functions, retrieval augmented generation, introduction to Python web frameworks for APIs, DevOps principles, deploying machine learning APIs, LLM platforms, and final presentations. Students will learn to build, share, and present portfolios using Github, YouTube, and Linkedin, as well as develop non-linear life-long learning skills. Prerequisites include basic Linux and programming skills, with coursework available in Python or Rust. Additional resources and references are provided for further learning and exploration.

github

: 73

agentUniverse

agentUniverse is a framework for developing applications powered by multi-agent based on large language model. It provides essential components for building single agent and multi-agent collaboration mechanism for customizing collaboration patterns. Developers can easily construct multi-agent applications and share pattern practices from different fields. The framework includes pre-installed collaboration patterns like PEER and DOE for complex task breakdown and data-intensive tasks.

github

: 787

llm-on-openshift

This repository provides resources, demos, and recipes for working with Large Language Models (LLMs) on OpenShift using OpenShift AI or Open Data Hub. It includes instructions for deploying inference servers for LLMs, such as vLLM, Hugging Face TGI, Caikit-TGIS-Serving, and Ollama. Additionally, it offers guidance on deploying serving runtimes, such as vLLM Serving Runtime and Hugging Face Text Generation Inference, in the Single-Model Serving stack of Open Data Hub or OpenShift AI. The repository also covers vector databases that can be used as a Vector Store for Retrieval Augmented Generation (RAG) applications, including Milvus, PostgreSQL+pgvector, and Redis. Furthermore, it provides examples of inference and application usage, such as Caikit, Langchain, Langflow, and UI examples.

github

: 112

kai

github

: 54

ai2apps

AI2Apps is a visual IDE for building LLM-based AI agent applications, enabling developers to efficiently create AI agents through drag-and-drop, with features like design-to-development for rapid prototyping, direct packaging of agents into apps, powerful debugging capabilities, enhanced user interaction, efficient team collaboration, flexible deployment, multilingual support, simplified product maintenance, and extensibility through plugins.

github

: 278

ThereForYou

ThereForYou is a groundbreaking solution aimed at enhancing public safety, particularly focusing on mental health support and suicide prevention. Leveraging cutting-edge technologies such as artificial intelligence (AI), machine learning (ML), natural language processing (NLP), and blockchain, the project offers accessible and empathetic assistance to individuals facing mental health challenges.

github

: 86

studio-b3

Studio B3 (B-3 Bomber) is a sophisticated editor designed for content creation, catering to various formats such as blogs, articles, user stories, and more. It provides an immersive content generation experience with local AI capabilities for intelligent search and recommendation functions. Users can define custom actions and variables for flexible content generation. The editor includes interactive tools like Bubble Menu, Slash Command, and Quick Insert for enhanced user experience in editing, searching, and navigation. The design principles focus on intelligent embedding of AI, local optimization for efficient writing experience, and context flexibility for better control over AI-generated content.

github

: 53

kdbai-samples

KDB.AI is a time-based vector database that allows developers to build scalable, reliable, and real-time applications by providing advanced search, recommendation, and personalization for Generative AI applications. It supports multiple index types, distance metrics, top-N and metadata filtered retrieval, as well as Python and REST interfaces. The repository contains samples demonstrating various use-cases such as temporal similarity search, document search, image search, recommendation systems, sentiment analysis, and more. KDB.AI integrates with platforms like ChatGPT, Langchain, and LlamaIndex. The setup steps require Unix terminal, Python 3.8+, and pip installed. Users can install necessary Python packages and run Jupyter notebooks to interact with the samples.

github

: 95

EngAce

EngAce is a cutting-edge, generative AI-powered application revolutionizing Vietnamese English learning. It offers personalized learning experiences combining AI with comprehensive features. The repository contains source code, documentation, and resources for the app.

github

: 82

arch

Arch is an intelligent Layer 7 gateway designed to protect, observe, and personalize LLM applications with APIs. It handles tasks like detecting and rejecting jailbreak attempts, calling backend APIs, disaster recovery, and observability. Built on Envoy Proxy, it offers features like function calling, prompt guardrails, traffic management, and standards-based observability. Arch aims to improve the speed, security, and personalization of generative AI applications.

github

: 90

ianvs

Ianvs is a distributed synergy AI benchmarking project incubated in KubeEdge SIG AI. It aims to test the performance of distributed synergy AI solutions following recognized standards, providing end-to-end benchmark toolkits, test environment management tools, test case control tools, and benchmark presentation tools. It also collaborates with other organizations to establish comprehensive benchmarks and related applications. The architecture includes critical components like Test Environment Manager, Test Case Controller, Generation Assistant, Simulation Controller, and Story Manager. Ianvs documentation covers quick start, guides, dataset descriptions, algorithms, user interfaces, stories, and roadmap.

github

: 111

AgentConnect

AgentConnect is an open-source implementation of the Agent Network Protocol (ANP) aiming to define how agents connect with each other and build an open, secure, and efficient collaboration network for billions of agents. It addresses challenges like interconnectivity, native interfaces, and efficient collaboration by providing authentication, end-to-end encryption, meta-protocol handling, and application layer protocol integration. The project focuses on performance and multi-platform support, with plans to rewrite core components in Rust and support Mac, Linux, Windows, mobile platforms, and browsers. AgentConnect aims to establish ANP as an industry standard through protocol development and forming a standardization committee.

github

: 66

openvino_build_deploy

The OpenVINO Build and Deploy repository provides pre-built components and code samples to accelerate the development and deployment of production-grade AI applications across various industries. With the OpenVINO Toolkit from Intel, users can enhance the capabilities of both Intel and non-Intel hardware to meet specific needs. The repository includes AI reference kits, interactive demos, workshops, and step-by-step instructions for building AI applications. Additional resources such as Jupyter notebooks and a Medium blog are also available. The repository is maintained by the AI Evangelist team at Intel, who provide guidance on real-world use cases for the OpenVINO toolkit.

github

: 136

advisingapp

**Advising App™** is a software solution created by Canyon GBS™ that includes a robust personal assistant designed to support student service professionals in their day-to-day roles. The assistant can help with research tasks, draft communication, language translation, content creation, student profile analysis, project planning, ideation, and much more. The software also includes a student service CRM designed to support the management of prospective and enrolled students. Key features of the CRM include record management, email and SMS, service management, caseload management, task management, interaction tracking, files and documents, and much more.

github

: 237

For similar tasks

ai_summer

github

: 59

CS7320-AI

CS7320-AI is a repository containing lecture materials, simple Python code examples, and assignments for the course CS 5/7320 Artificial Intelligence. The code examples cover various chapters of the textbook 'Artificial Intelligence: A Modern Approach' by Russell and Norvig. The repository focuses on basic AI concepts rather than advanced implementation techniques. It includes HOWTO guides for installing Python, working on assignments, and using AI with Python.

github

: 69

dynamiq

Dynamiq is an orchestration framework designed to streamline the development of AI-powered applications, specializing in orchestrating retrieval-augmented generation (RAG) and large language model (LLM) agents. It provides an all-in-one Gen AI framework for agentic AI and LLM applications, offering tools for multi-agent orchestration, document indexing, and retrieval flows. With Dynamiq, users can easily build and deploy AI solutions for various tasks.

github

: 780

craftium

Craftium is an open-source platform based on the Minetest voxel game engine and the Gymnasium and PettingZoo APIs, designed for creating fast, rich, and diverse single and multi-agent environments. It allows for connecting to Craftium's Python process, executing actions as keyboard and mouse controls, extending the Lua API for creating RL environments and tasks, and supporting client/server synchronization for slow agents. Craftium is fully extensible, extensively documented, modern RL API compatible, fully open source, and eliminates the need for Java. It offers a variety of environments for research and development in reinforcement learning.

github

: 64

turing

Viglet Turing ES is an open source solution with Semantic Navigation and Chat bot features. It indexes all content in Solr as a search engine.

github

: 57

LLM-Learn-PK

LLM-Learn-PK is a repository for testing various LLM and RAG tests. It serves as a learning platform where the creator experiments with different tests and learns in the process.

github

: 86

hume-python-sdk

The Hume AI Python SDK allows users to integrate Hume APIs directly into their Python applications. Users can access complete documentation, quickstart guides, and example notebooks to get started. The SDK is designed to provide support for Hume's expressive communication platform built on scientific research. Users are encouraged to create an account at beta.hume.ai and stay updated on changes through Discord. The SDK may undergo breaking changes to improve tooling and ensure reliable releases in the future.

github

: 79

allAI

allAI is a toolbox for AI-related discussions and resources. It provides a platform for sharing knowledge, tutorials, and addressing common AI-related queries. The repository aims to foster a community for AI enthusiasts to engage in meaningful conversations and collaborations. Users can access Quark Cloud for downloads and instructional videos. Additionally, the repository encourages contributions and prohibits the dissemination of spam, advertisements, or unsolicited promotions. The project is supported by Pinokio and offers users the freedom to utilize, modify, and distribute the software within the specified conditions.

github

: 125

For similar jobs

NanoLLM

NanoLLM is a tool designed for optimized local inference for Large Language Models (LLMs) using HuggingFace-like APIs. It supports quantization, vision/language models, multimodal agents, speech, vector DB, and RAG. The tool aims to provide efficient and effective processing for LLMs on local devices, enhancing performance and usability for various AI applications.

github

: 156

mslearn-ai-fundamentals

This repository contains materials for the Microsoft Learn AI Fundamentals module. It covers the basics of artificial intelligence, machine learning, and data science. The content includes hands-on labs, interactive learning modules, and assessments to help learners understand key concepts and techniques in AI. Whether you are new to AI or looking to expand your knowledge, this module provides a comprehensive introduction to the fundamentals of AI.

github

: 91

awesome-ai-tools

Awesome AI Tools is a curated list of popular tools and resources for artificial intelligence enthusiasts. It includes a wide range of tools such as machine learning libraries, deep learning frameworks, data visualization tools, and natural language processing resources. Whether you are a beginner or an experienced AI practitioner, this repository aims to provide you with a comprehensive collection of tools to enhance your AI projects and research. Explore the list to discover new tools, stay updated with the latest advancements in AI technology, and find the right resources to support your AI endeavors.

github

: 1.6k

go2coding.github.io

The go2coding.github.io repository is a collection of resources for AI enthusiasts, providing information on AI products, open-source projects, AI learning websites, and AI learning frameworks. It aims to help users stay updated on industry trends, learn from community projects, access learning resources, and understand and choose AI frameworks. The repository also includes instructions for local and external deployment of the project as a static website, with details on domain registration, hosting services, uploading static web pages, configuring domain resolution, and a visual guide to the AI tool navigation website. Additionally, it offers a platform for AI knowledge exchange through a QQ group and promotes AI tools through a WeChat public account.

github

: 201

AI-Notes

AI-Notes is a repository dedicated to practical applications of artificial intelligence and deep learning. It covers concepts such as data mining, machine learning, natural language processing, and AI. The repository contains Jupyter Notebook examples for hands-on learning and experimentation. It explores the development stages of AI, from narrow artificial intelligence to general artificial intelligence and superintelligence. The content delves into machine learning algorithms, deep learning techniques, and the impact of AI on various industries like autonomous driving and healthcare. The repository aims to provide a comprehensive understanding of AI technologies and their real-world applications.

github

: 755

promptpanel

Prompt Panel is a tool designed to accelerate the adoption of AI agents by providing a platform where users can run large language models across any inference provider, create custom agent plugins, and use their own data safely. The tool allows users to break free from walled-gardens and have full control over their models, conversations, and logic. With Prompt Panel, users can pair their data with any language model, online or offline, and customize the system to meet their unique business needs without any restrictions.

github

: 53

ai-demos

The 'ai-demos' repository is a collection of example code from presentations focusing on building with AI and LLMs. It serves as a resource for developers looking to explore practical applications of artificial intelligence in their projects. The code snippets showcase various techniques and approaches to leverage AI technologies effectively. The repository aims to inspire and educate developers on integrating AI solutions into their applications.

github

: 163

ai_summer

github

: 59