
ai_summer
Summary repository for AI Summer 2024
Stars: 59

AI Summer is a repository focused on providing workshops and resources for developing foundational skills in generative AI models and transformer models. The repository offers practical applications for inferencing and training, with a specific emphasis on understanding and utilizing advanced AI chat models like BingGPT. Participants are encouraged to engage in interactive programming environments, decide on projects to work on, and actively participate in discussions and breakout rooms. The workshops cover topics such as generative AI models, retrieval-augmented generation, building AI solutions, and fine-tuning models. The goal is to equip individuals with the necessary skills to work with AI technologies effectively and securely, both locally and in the cloud.
README:
Summary repository for AI Summer 2024. Introduction to generative AI, with practical applications to inferencing and training
Presented by Vanderbilt Data Science Institute data scientists:
- Dr. Jesse Spencer-Smith, Chief Data Scientist
- Dr. Charreau Bell, Senior Data Scientist
- Myranda Shirk, Senior Data Scientist
- Umang Chaudhry, Data Scientist
- Dr. Abigail Petulante, DSI Postdoctoral Fellow
- Dr. Joshua Su, DSI Postdoctoral Fellow
The objective of these workshops is to develop foundational skills in understanding, inferencing and training generative AI models and other transformer models.
Practice your Python skills using the below documents. Choose either a Google Colab for interactive programming environment, or alternatively read through the Google Doc.
You’ll want to use the most advanced AI chat model that you can get access to. Microsoft just opened access to BingGPT through Bing Chat, which is based on an early version of GPT4, currently the most advanced AI chat model available to the public. You’ll need to install the Edge browser (https://www.microsoft.com › edge › download) and go to bing. com. Click on “Chat”.
Think about any data you might want to bring to the workshop. Also begin thinking about any projects you might want to accomplish during our month. We’ll have office hours for you to work with us to get your first project off the ground!
Session will run live from 9am-11am, with an office hour from 11am to noon (all times Central).
No class Friday (Vanderbilt Commencement)
Weeks 2, 5/13 - 5/17: Retrieval-Augmented Generation (RAG), Assistants, Agents, and Intro to Diffusion Models
Week 3, 5/20 - 5/24: Building AI Solutions, Running AI Securely Locally or in the Cloud, Introduction to Training Models
Monday:
-
Homework: Watch the following videos: General Backprop and (math-centric backprop](https://youtu.be/tIeHLnjs5U8?si=mnT36GTL7YqU8qBO)
Wednesday: Recording: (https://vanderbilt.zoom.us/rec/share/fswTlpFMlqAVgxRDDBza920i9brAuxaSiteHpDNUwpm9YQzedJa5g_2oZSSr2Eq1.wF73yKYGD5eY3cyY?startTime=1716392393000)
Friday: Recording: (https://vanderbilt.zoom.us/rec/share/plozihJcLFBIfjPxQ8Bsv9IdqHh39qFinkVUChsYtuiuiGAc8O2TcvTEbTE5cAUW.3XYBPJfbdZJ1GzAS?startTime=1716558902000)
No class Monday (Memorial Day)
Wednesday:
Papers/Blogs discussed:
https://arxiv.org/pdf/2405.17247
https://proceedings.mlr.press/v139/radford21a/radford21a.pdf
https://arxiv.org/pdf/2405.09818
https://arxiv.org/pdf/2304.10592
https://arxiv.org/pdf/2310.03744
https://huggingface.co/papers/2311.05437
https://arxiv.org/pdf/2311.05437
https://llava-vl.github.io/blog/2024-01-30-llava-next/
https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/
https://llava-vl.github.io/blog/2024-04-30-llava-next-video/
https://arxiv.org/abs/2310.02239
Remember we are all learning and exploring
- Please share your video upon entering the room and unmute
- Share your screens--someone volunteer to share their screen upon entering, and everyone be ready to share your screen to show what you’ve found
- Make notes of what you’ve discussed in the Response Reports below
- Everyone be ready to report out (random)
- Make some friends
- Breakout Rooms Worksheets
Google Docs has a limit of 100 people viewing/editing a document at one time.
Please be sure your display name is set in Zoom. If you are in one of the following special groups, please pre-pend your name with one of the following qualifiers.
- Data Science for Social Good: DSSG
- Center for AI in Protein Dynamics: Protein
- If you are in a lab and would like your own breakout room: Labname (keep it short, please!)
- If you are faculty and would like to be in a breakout room with other faculty: Faculty
For example, I might be DSSG-Jesse Spencer-Smith
Video recordings of these workshops can be found on our YouTube channel AI Summer playlist
Looking for the code resources for Summer 2023? View the 2023 repo version here.
- Prompt Engineering paper https://arxiv.org/abs/2302.11382
- Prompt Engineering Courserea Course: https://www.coursera.org/learn/prompt-engineering
- Visual overview of Generative AI from 3Blue1Brown: https://www.youtube.com/watch?v=wjZofJX0v4M
- Semester-long course on transformer models, DS 5690. Graduate students and advanced undergraduates can register by contacting me. I welcome auditing by a select number of postdoctoral fellows, and drop-ins from faculty!
DGX A100 Compute Grant: https://forms.gle/2mGfEy9DB4JU2GpZ8
- Natural Language Processing with Transformers by Lewis Tunstall, Leandro von Werra and Thomas Wolf. If you are affiliated with Vanderbilt University, you can access this pre-print book (and any book by O’Reilly) free by logging into O'Reilly Media using your Vanderbilt email address. Vanderbilt licenses all content from O’Reilly. The book covers Transformers for purposes beyond text.
To get the most out of this workshop:
- Open Colab (workbook) notebooks and actively write code along with the instructor
- Actively participate in discussions
- Actively participate in breakout rooms
- Work on homework assignments before coming to class
- Relax your mind and ask questions
- Open the Edge browser (yes, Edge) and navigate to www.bing.com
- Select "chat". A new window should open saying you need the new Bing.
- Select "Start chatting" at the bottom of this window. This should prompt you to sign in to a Microsoft account. Do not use an organizational/school email (such as Vanderbilt). Instead, select "No account? Create a new one" and create one with your personal email. Note: if you get stuck in the "use the new Bing" window, go back to Bing.com and select "Sign in" instead. Follow instructions for Step 3.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for ai_summer
Similar Open Source Tools

ai_summer
AI Summer is a repository focused on providing workshops and resources for developing foundational skills in generative AI models and transformer models. The repository offers practical applications for inferencing and training, with a specific emphasis on understanding and utilizing advanced AI chat models like BingGPT. Participants are encouraged to engage in interactive programming environments, decide on projects to work on, and actively participate in discussions and breakout rooms. The workshops cover topics such as generative AI models, retrieval-augmented generation, building AI solutions, and fine-tuning models. The goal is to equip individuals with the necessary skills to work with AI technologies effectively and securely, both locally and in the cloud.

intro-to-intelligent-apps
This repository introduces and helps organizations get started with building AI Apps and incorporating Large Language Models (LLMs) into them. The workshop covers topics such as prompt engineering, AI orchestration, and deploying AI apps. Participants will learn how to use Azure OpenAI, Langchain/ Semantic Kernel, Qdrant, and Azure AI Search to build intelligent applications.

learn-agentic-ai
Learn Agentic AI is a repository that is part of the Panaversity Certified Agentic and Robotic AI Engineer program. It covers AI-201 and AI-202 courses, providing fundamentals and advanced knowledge in Agentic AI. The repository includes video playlists, projects, and project submission guidelines for students to enhance their understanding and skills in the field of AI engineering.

llmops-duke-aipi
LLMOps Duke AIPI is a course focused on operationalizing Large Language Models, teaching methodologies for developing applications using software development best practices with large language models. The course covers various topics such as generative AI concepts, setting up development environments, interacting with large language models, using local large language models, applied solutions with LLMs, extensibility using plugins and functions, retrieval augmented generation, introduction to Python web frameworks for APIs, DevOps principles, deploying machine learning APIs, LLM platforms, and final presentations. Students will learn to build, share, and present portfolios using Github, YouTube, and Linkedin, as well as develop non-linear life-long learning skills. Prerequisites include basic Linux and programming skills, with coursework available in Python or Rust. Additional resources and references are provided for further learning and exploration.

ThereForYou
ThereForYou is a groundbreaking solution aimed at enhancing public safety, particularly focusing on mental health support and suicide prevention. Leveraging cutting-edge technologies such as artificial intelligence (AI), machine learning (ML), natural language processing (NLP), and blockchain, the project offers accessible and empathetic assistance to individuals facing mental health challenges.

SuperKnowa
SuperKnowa is a fast framework to build Enterprise RAG (Retriever Augmented Generation) Pipelines at Scale, powered by watsonx. It accelerates Enterprise Generative AI applications to get prod-ready solutions quickly on private data. The framework provides pluggable components for tackling various Generative AI use cases using Large Language Models (LLMs), allowing users to assemble building blocks to address challenges in AI-driven text generation. SuperKnowa is battle-tested from 1M to 200M private knowledge base & scaled to billions of retriever tokens.

Conversation-Knowledge-Mining-Solution-Accelerator
The Conversation Knowledge Mining Solution Accelerator enables customers to leverage intelligence to uncover insights, relationships, and patterns from conversational data. It empowers users to gain valuable knowledge and drive targeted business impact by utilizing Azure AI Foundry, Azure OpenAI, Microsoft Fabric, and Azure Search for topic modeling, key phrase extraction, speech-to-text transcription, and interactive chat experiences.

twinny
Twinny is a free and open-source AI code completion plugin for Visual Studio Code and compatible editors. It integrates with various tools and frameworks, including Ollama, llama.cpp, oobabooga/text-generation-webui, LM Studio, LiteLLM, and Open WebUI. Twinny offers features such as fill-in-the-middle code completion, chat with AI about your code, customizable API endpoints, and support for single or multiline fill-in-middle completions. It is easy to install via the Visual Studio Code extensions marketplace and provides a range of customization options. Twinny supports both online and offline operation and conforms to the OpenAI API standard.

advisingapp
**Advising App™** is a software solution created by Canyon GBS™ that includes a robust personal assistant designed to support student service professionals in their day-to-day roles. The assistant can help with research tasks, draft communication, language translation, content creation, student profile analysis, project planning, ideation, and much more. The software also includes a student service CRM designed to support the management of prospective and enrolled students. Key features of the CRM include record management, email and SMS, service management, caseload management, task management, interaction tracking, files and documents, and much more.

twinny
Twinny is a free and private AI extension for Visual Studio Code that offers AI-based code completion and code discussion features. It provides real-time code suggestions, function explanations, test generation, refactoring requests, and more. Twinny operates both online and offline, supports customizable API endpoints, conforms to OpenAI API standards, and offers various customization options for prompt templates, API providers, model names, and more. It is compatible with multiple APIs and allows users to accept code solutions directly in the editor, create new documents from code blocks, and copy generated code solution blocks. Twinny is open-source under the MIT license and welcomes contributions from the community.

LLM-Assistant
LLM-Assistant is a browser interface based on Gradio that interfaces with local LLMs to call functions and act as a general assistant. It works with any instruct-finetuned LLM, can search for information (RAG), knows when to call functions, has realtime mode for working across the system, and answers questions from PDF files. The tool aims to provide voice access and more functions in the future. Current bugs include rare crashes. Setup involves cloning the repo to a virtual environment, installing requirements, downloading and placing LLM model in the model folder, and running main.py. Usage includes Assistant mode for general chat and calling functions like playing music, as well as Realtime mode for editing documents or replying to emails in real-time.

AIStudyAssistant
AI Study Assistant is an app designed to enhance learning experience and boost academic performance. It serves as a personal tutor, lecture summarizer, writer, and question generator powered by Google PaLM 2. Features include interacting with an AI chatbot, summarizing lectures, generating essays, and creating practice questions. The app is built using 100% Kotlin, Jetpack Compose, Clean Architecture, and MVVM design pattern, with technologies like Ktor, Room DB, Hilt, and Kotlin coroutines. AI Study Assistant aims to provide comprehensive AI-powered assistance for students in various academic tasks.

langwatch
LangWatch is a monitoring and analytics platform designed to track, visualize, and analyze interactions with Large Language Models (LLMs). It offers real-time telemetry to optimize LLM cost and latency, a user-friendly interface for deep insights into LLM behavior, user analytics for engagement metrics, detailed debugging capabilities, and guardrails to monitor LLM outputs for issues like PII leaks and toxic language. The platform supports OpenAI and LangChain integrations, simplifying the process of tracing LLM calls and generating API keys for usage. LangWatch also provides documentation for easy integration and self-hosting options for interested users.

miyagi
Project Miyagi showcases Microsoft's Copilot Stack in an envisioning workshop aimed at designing, developing, and deploying enterprise-grade intelligent apps. By exploring both generative and traditional ML use cases, Miyagi offers an experiential approach to developing AI-infused product experiences that enhance productivity and enable hyper-personalization. Additionally, the workshop introduces traditional software engineers to emerging design patterns in prompt engineering, such as chain-of-thought and retrieval-augmentation, as well as to techniques like vectorization for long-term memory, fine-tuning of OSS models, agent-like orchestration, and plugins or tools for augmenting and grounding LLMs.

obsidian-systemsculpt-ai
SystemSculpt AI is a comprehensive AI-powered plugin for Obsidian, integrating advanced AI capabilities into note-taking, task management, knowledge organization, and content creation. It offers modules for brain integration, chat conversations, audio recording and transcription, note templates, and task generation and management. Users can customize settings, utilize AI services like OpenAI and Groq, and access documentation for detailed guidance. The plugin prioritizes data privacy by storing sensitive information locally and offering the option to use local AI models for enhanced privacy.

data-formulator
Data Formulator is an AI-powered tool developed by Microsoft Research to help data analysts create rich visualizations iteratively. It combines user interface interactions with natural language inputs to simplify the process of describing chart designs while delegating data transformation to AI. Users can utilize features like blended UI and NL inputs, data threads for history navigation, and code inspection to create impressive visualizations. The tool supports local installation for customization and Codespaces for quick setup. Developers can build new data analysis tools on top of Data Formulator, and research papers are available for further reading.
For similar tasks

ai_summer
AI Summer is a repository focused on providing workshops and resources for developing foundational skills in generative AI models and transformer models. The repository offers practical applications for inferencing and training, with a specific emphasis on understanding and utilizing advanced AI chat models like BingGPT. Participants are encouraged to engage in interactive programming environments, decide on projects to work on, and actively participate in discussions and breakout rooms. The workshops cover topics such as generative AI models, retrieval-augmented generation, building AI solutions, and fine-tuning models. The goal is to equip individuals with the necessary skills to work with AI technologies effectively and securely, both locally and in the cloud.

CS7320-AI
CS7320-AI is a repository containing lecture materials, simple Python code examples, and assignments for the course CS 5/7320 Artificial Intelligence. The code examples cover various chapters of the textbook 'Artificial Intelligence: A Modern Approach' by Russell and Norvig. The repository focuses on basic AI concepts rather than advanced implementation techniques. It includes HOWTO guides for installing Python, working on assignments, and using AI with Python.

dynamiq
Dynamiq is an orchestration framework designed to streamline the development of AI-powered applications, specializing in orchestrating retrieval-augmented generation (RAG) and large language model (LLM) agents. It provides an all-in-one Gen AI framework for agentic AI and LLM applications, offering tools for multi-agent orchestration, document indexing, and retrieval flows. With Dynamiq, users can easily build and deploy AI solutions for various tasks.

craftium
Craftium is an open-source platform based on the Minetest voxel game engine and the Gymnasium and PettingZoo APIs, designed for creating fast, rich, and diverse single and multi-agent environments. It allows for connecting to Craftium's Python process, executing actions as keyboard and mouse controls, extending the Lua API for creating RL environments and tasks, and supporting client/server synchronization for slow agents. Craftium is fully extensible, extensively documented, modern RL API compatible, fully open source, and eliminates the need for Java. It offers a variety of environments for research and development in reinforcement learning.

hume-python-sdk
The Hume AI Python SDK allows users to integrate Hume APIs directly into their Python applications. Users can access complete documentation, quickstart guides, and example notebooks to get started. The SDK is designed to provide support for Hume's expressive communication platform built on scientific research. Users are encouraged to create an account at beta.hume.ai and stay updated on changes through Discord. The SDK may undergo breaking changes to improve tooling and ensure reliable releases in the future.

allAI
allAI is a toolbox for AI-related discussions and resources. It provides a platform for sharing knowledge, tutorials, and addressing common AI-related queries. The repository aims to foster a community for AI enthusiasts to engage in meaningful conversations and collaborations. Users can access Quark Cloud for downloads and instructional videos. Additionally, the repository encourages contributions and prohibits the dissemination of spam, advertisements, or unsolicited promotions. The project is supported by Pinokio and offers users the freedom to utilize, modify, and distribute the software within the specified conditions.

mindsdb
MindsDB is a platform for customizing AI from enterprise data. You can create, serve, and fine-tune models in real-time from your database, vector store, and application data. MindsDB "enhances" SQL syntax with AI capabilities to make it accessible for developers worldwide. With MindsDB’s nearly 200 integrations, any developer can create AI customized for their purpose, faster and more securely. Their AI systems will constantly improve themselves — using companies’ own data, in real-time.

training-operator
Kubeflow Training Operator is a Kubernetes-native project for fine-tuning and scalable distributed training of machine learning (ML) models created with various ML frameworks such as PyTorch, Tensorflow, XGBoost, MPI, Paddle and others. Training Operator allows you to use Kubernetes workloads to effectively train your large models via Kubernetes Custom Resources APIs or using Training Operator Python SDK. > Note: Before v1.2 release, Kubeflow Training Operator only supports TFJob on Kubernetes. * For a complete reference of the custom resource definitions, please refer to the API Definition. * TensorFlow API Definition * PyTorch API Definition * Apache MXNet API Definition * XGBoost API Definition * MPI API Definition * PaddlePaddle API Definition * For details of all-in-one operator design, please refer to the All-in-one Kubeflow Training Operator * For details on its observability, please refer to the monitoring design doc.
For similar jobs

NanoLLM
NanoLLM is a tool designed for optimized local inference for Large Language Models (LLMs) using HuggingFace-like APIs. It supports quantization, vision/language models, multimodal agents, speech, vector DB, and RAG. The tool aims to provide efficient and effective processing for LLMs on local devices, enhancing performance and usability for various AI applications.

mslearn-ai-fundamentals
This repository contains materials for the Microsoft Learn AI Fundamentals module. It covers the basics of artificial intelligence, machine learning, and data science. The content includes hands-on labs, interactive learning modules, and assessments to help learners understand key concepts and techniques in AI. Whether you are new to AI or looking to expand your knowledge, this module provides a comprehensive introduction to the fundamentals of AI.

awesome-ai-tools
Awesome AI Tools is a curated list of popular tools and resources for artificial intelligence enthusiasts. It includes a wide range of tools such as machine learning libraries, deep learning frameworks, data visualization tools, and natural language processing resources. Whether you are a beginner or an experienced AI practitioner, this repository aims to provide you with a comprehensive collection of tools to enhance your AI projects and research. Explore the list to discover new tools, stay updated with the latest advancements in AI technology, and find the right resources to support your AI endeavors.

go2coding.github.io
The go2coding.github.io repository is a collection of resources for AI enthusiasts, providing information on AI products, open-source projects, AI learning websites, and AI learning frameworks. It aims to help users stay updated on industry trends, learn from community projects, access learning resources, and understand and choose AI frameworks. The repository also includes instructions for local and external deployment of the project as a static website, with details on domain registration, hosting services, uploading static web pages, configuring domain resolution, and a visual guide to the AI tool navigation website. Additionally, it offers a platform for AI knowledge exchange through a QQ group and promotes AI tools through a WeChat public account.

AI-Notes
AI-Notes is a repository dedicated to practical applications of artificial intelligence and deep learning. It covers concepts such as data mining, machine learning, natural language processing, and AI. The repository contains Jupyter Notebook examples for hands-on learning and experimentation. It explores the development stages of AI, from narrow artificial intelligence to general artificial intelligence and superintelligence. The content delves into machine learning algorithms, deep learning techniques, and the impact of AI on various industries like autonomous driving and healthcare. The repository aims to provide a comprehensive understanding of AI technologies and their real-world applications.

promptpanel
Prompt Panel is a tool designed to accelerate the adoption of AI agents by providing a platform where users can run large language models across any inference provider, create custom agent plugins, and use their own data safely. The tool allows users to break free from walled-gardens and have full control over their models, conversations, and logic. With Prompt Panel, users can pair their data with any language model, online or offline, and customize the system to meet their unique business needs without any restrictions.

ai-demos
The 'ai-demos' repository is a collection of example code from presentations focusing on building with AI and LLMs. It serves as a resource for developers looking to explore practical applications of artificial intelligence in their projects. The code snippets showcase various techniques and approaches to leverage AI technologies effectively. The repository aims to inspire and educate developers on integrating AI solutions into their applications.

ai_summer
AI Summer is a repository focused on providing workshops and resources for developing foundational skills in generative AI models and transformer models. The repository offers practical applications for inferencing and training, with a specific emphasis on understanding and utilizing advanced AI chat models like BingGPT. Participants are encouraged to engage in interactive programming environments, decide on projects to work on, and actively participate in discussions and breakout rooms. The workshops cover topics such as generative AI models, retrieval-augmented generation, building AI solutions, and fine-tuning models. The goal is to equip individuals with the necessary skills to work with AI technologies effectively and securely, both locally and in the cloud.