 
                MediaAI
Aalto University's Intelligent Computational Media course (AI & ML for media, art & design)
Stars: 61
 
    MediaAI is a repository containing lectures and materials for Aalto University's AI for Media, Art & Design course. The course is a hands-on, project-based crash course focusing on deep learning and AI techniques for artists and designers. It covers common AI algorithms & tools, their applications in art, media, and design, and provides hands-on practice in designing, implementing, and using these tools. The course includes lectures, exercises, and a final project based on students' interests. Students can complete the course without programming by creatively utilizing existing tools like ChatGPT and DALL-E. The course emphasizes collaboration, peer-to-peer tutoring, and project-based learning. It covers topics such as text generation, image generation, optimization, and game AI.
README:
This repository contains the lectures and materials of Aalto University's AI for Media, Art & Design course. Scroll down for lecture slides and exercises.
Follow the course's Twitter feed for links and resources.
This is a hands-on, project-based crash course for deep learning and other AI techniques for people with as little technical prerequisites as possible. The focus is on media processing and games, which makes this particularly suitable for artists and designers.
The 2024 edition of this course is taught during Aalto University's Period 3 (six weeks) by Prof Perttu Hämäläinen (Twitter) and Nam Hee Gordon Kim. (Twitter). Registration through Aalto's Mycourses system.
Learning Goals
The goal is for students to:
- Understand how common AI algorithms & tools work,
- Understand what the tools can be used for in context of art, media, and design, and
- Get hands-on practice of designing, implementing and/or using the tools.
Pedagogical approach
The course is taught through:
- Lectures
- Exercises that require you to either practice using existing AI tools, programmatically utilize the tools (e.g., to automate tedious manual prompting), or build new systems. We always try to provide both easy and advanced exercises to cater for different skill levels.
- Final project on topics based on each student's interests. This can also be done in pairs.
The exercises and project work are designed to scale for a broad range of skill levels and starting points.
Outside lectures and exercises, we use Teams for sharing results and peer-to-peer tutoring and guidance (Teams invitation will be send to registered students).
Student Prerequisites
Although many of the exercises do require some Python programming and math skills, one can complete the course without programming, by focusing on creative utilization of existing tools such as ChatGPT and DALL-E.
Grading / Project Work
You pass the course by submitting a report of your project in MyCourses. The grading is pass / fail, as numerical grading is not feasible on a course where students typically come from very different backgrounds. To get your project accepted, the main requirement is that a you make an effort and advance from your individual starting point.
It is also recommended to make the project publicly available, e.g., as a Colab notebook or Github repository. Instead of a project report, you may simply submit a link to the notebook or repository, if they contain the needed documentation.
Students can choose their project topics based on their own interests and learning goals. Projects are agreed on with the teachers. You could create something in Colab or Unity Machine Learning agents, or if you'd rather not write any code, experimenting with artist-friendly tools like GIMP-ML or RunwayML to create something new and/or interesting is also ok. For example, one could generate song lyrics using a text generator such as GPT-3, and use them to compose and record a song.
Before the first lecture, you should:
- 
Prepare to add one slide to a shared Google Slides document (link provided at the first lecture), including 1) your name and photo, 2) your background and skillset, 3) and what kind of projects you want to work on. This will be useful for finding other students with similar interests and/or complementary skills. If you do not yet know what to work on, browse the Course Twitter for inspiration. 
- 
Make yourself a Google account if you don't have one. This is needed for Google Colab, which we use for many demos and programming exercises. Important security notice: Colab notebooks run on Google servers and by default cannot access files on your computer. However, some notebooks might contain malicious code. Thus, do not input any passwords or let a Colab notebook connect with your Google Drive unless you trust the notebook. The notebooks in this repository should be safe but the lecture slides also link to 3rd party notebooks that one can never be sure about. 
- 
For using OpenAI tools, it's also good to get an OpenAI account. When creating the account, you get some free quota for generating text and images. 
- 
If you plan to try the programming exercises of the course, these Python learning resources might come handy. However, if you know some other programming language, you should be able to learn while going through the exercises. 
- 
To grasp the fundamentals of what neural networks are doing, watch episodes 1-3 of 3Blue1Brown's neural network series 
- 
Learn about Colab notebooks (used for course exercises) by watching these videos: Video 1, Video 2 and reading through this Tutorial notebook. Feel free to also take a peek at this course's exercise notebooks such as GAN image generation. To test the notebook, select "run all" from the "runtime" menu. This runs all the code cells in sequence and should generate and display some images at the bottom of the notebook. In general, when opening a notebook, it's good to use "run all" first, before starting to modify individual code cells, because the cells often depend on variables initialized or packages imported in the preceding cells. 
We will spend roughly one week per topic, devoting the last sessions to individual project work.
Overview and Motivation:
- Lecture: Overview and motivation. Why one should rather co-create than compete with AI technology.
- Exercise: Each student adds their slide to the shared Google Slides. We will then go through the slides and briefly discuss the topics and who might benefit from collaborating with others.
Optional programming exercises for those with at least some programming background:
- Exercise: If you didn't already do it, go through the Colab learning links in the course prerequisites.
- Exercise: Introduction to tensors, numpy and matplotlib through processing images and audio. [Open in Colab], [Solutions]
- Related to the above, see also: https://numpy.org/devdocs/user/absolute_beginners.html
- Exercise: Continuing the Numpy introduction, now for a simple data science project. [Open in Colab], [Solutions].
- Exercise: Training a very simple neural network using a Kaggle dataset of human height and weight. [Open in Colab], [Solutions]
Text Generation & Co-writing with AI:
- 
Lecture: Co-writing with AI. Introduction to Large Language Models (LLMs), some history, examples of different types of texts and applications. 
- 
Lecture: Deeper into transformers. Understanding transformer attention and emergent abilities of LLMs. 
- 
Exercises: Generating game ideas, automating manual prompting using Python, Retrieval-Augmented Generation 
- 
Tutorial: Few-shot Prompting using OpenAI API 
Image Generation
- 
Lecture: Image generation 
- 
Setup: If you have 2060 GPU or better, consider installing Stable Diffusion locally - no subscription fees, no banned topics or keywords. It's best to install Stable Diffusion with a UI such as ComfyUI or Foocus, which provide advanced features like ControlNet that allows you to input a rough image sketch to control the layout of the generated images. 
- 
Prompting Exercise: Prompt images with different art styles, cameras, lighting... For reference, see The DALL-E 2 prompt book Use your preferred text-to-image tool such as DALL-E, MidJourney or Stable Diffusion. You can install Stable Diffusion locally (see above), and it is also available through Colab: Huggingface basic notebook, Notebook with Automatic1111 WebUI, Notebook with Foocus UI. If you prefer a mobile app, some free options are Microsoft Copilot for ChatGPT and DALL-E on iOS or Android, or Draw Things for Stable Diffusion 
- 
Prompting Exercise: Pick an interesting reference image and try to come up with a text prompt that produces an image as close to the reference as possible. This helps you hone your descriptive English writing skills. 
- 
Prompting Exercise: Practice using both text and image prompts using StableDiffusion and ControlNet. This can be done either in Colab, installing Stable Diffusion locally (see above), or using a mobile app such as Draw Things 
- 
Prompting Exercise: Make your own version of making a bunny happier, progressively exaggerating some other aspect of some initial prompt. 
- 
Colab Exercise: Using a Pre-Trained Generative Adversarial Network (GAN) to generate and interpolate images. [Open in Colab], [Solutions] 
- 
Colab Exercise: Interpolate between prompts using Stable Diffusion 
- 
Colab Exercise: Finetune StableDiffusion using your own images. There are multiple options, although all of them seem to require at least 24GB of GPU memory => you'll most likely need a paid Colab account. Some options: Huggingface Diffusers official tutorial, Joe Penna's DreamBooth, TheLastBen 
Generating other media, real-life workflows
- Lecture: Generating other media
Optimization
- Lecture: Optimization. Mathematical optimization is at the heart of almost all AI and ML. We've already applied optimization when training neural networks; now it's the time to get a bit wider and deeper understanding. We'll cover a number of common techniques such as Deep Reinforcement Learning (DRL) and Covariance Matrix Adaptation Evolution Strategy (CMA-ES).
- Exercise: Experiment with abstract art generation using CLIPDraw and StyleCLIPDraw. First, follow the notebook instructions to get the code to generate something. Then try different text prompts and different drawing parameters.
- Exercise (hard, optional): Modify CLIPDraw or StyleCLIPDraw to use CMA-ES instead of Adam. This should allow more robust results if you use high abstraction (only a few drawing primitives), which tends to make Adam more probable to get stuck in a bad local optimum. For reference, you can see this old course exercise on Generating abstract adversarial art using CMA-ES. Note: You can also combine CMA-ES and Adam by first finding an approximate solution with CMA-ES and then finetuning with Adam.
- Unity exercise (optional): Discovering billiards trick shots in Unity. Download the project folder and test it in Unity.
Game AI
- Lecture: Game AI What is game AI? Game AI Research in industry / academia. Core areas of videogame AI. Deep Dive: State-of-the-art AI playtesting (Roohi et al., 2021): Combining deep reinforcement learning (DRL), Monte-Carlo tree search (MCTS) and a player population simulation to estimate player engagement and difficulty in a match-3 game.
- Exercise: Deep Reinforcement Learning for General Game-Playing [Open in Colab]. [Open in Colab with Solutions].
a.k.a. Heroes of Creative AI and ML coding
Here are some people who are mixing AI, machine learning, art, and design with awesome results:
- http://otoro.net/ml/
- http://genekogan.com/
- http://quasimondo.com/
- http://zach.li/
- http://memo.tv
- https://www.enist.org
The lecture slides have more extensive links to resources on each covered topic. Here, we only list some general resources:
- ml5js & p5js, if you prefer Javascript to Python, this toolset may provides the fastest way to creative AI coding in a browser-based editor, without installing anything. Works even on mobile browsers! This example uses a deep neural network to track your nose and draw on the webcam view. This one utilizes similar PoseNet tracking to control procedural audio synthesis.
- Machine Learning for Artists (ml4a), including many cool demos, many of them built using p5js and ml5js.
- Unity Machine Learning Agents, a framework for using deep reinforcement learning for Unity. Includes code examples and blog posts.
- Two Minute Papers, a YouTube channel with short and accessible explanations of AI and deep learning research papers.
- 3Blue1Brown, a YouTube channel with excellent visual explanations on math, including neural networks and linear algebra.
- Elements of AI, an online course by University of Helsinki and Reaktor. Aalto students can also get 2 credits for this course. This is a course about the basic concepts, societal implications etc., no coding.
- Game AI Book by Togelius and Yannakakis. PDF available.
- Understanding Deep Learning by Simon J.D. Prince. Published in December 2023, this is currently the most up-to-date textbook for deep learning, praised for its clear explanations and helpful visualizations. An excellent resource for digging deeper, for those that can handle some linear algebra, probability, and statistics. PDF available.
The field is changing rapidly and we are constantly collecting new teaching material.
Follow the course's Twitter feed to stay updated. The twitter works as a public backlog of material that is used when updating the lecture slides.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for MediaAI
Similar Open Source Tools
 
            
            MediaAI
MediaAI is a repository containing lectures and materials for Aalto University's AI for Media, Art & Design course. The course is a hands-on, project-based crash course focusing on deep learning and AI techniques for artists and designers. It covers common AI algorithms & tools, their applications in art, media, and design, and provides hands-on practice in designing, implementing, and using these tools. The course includes lectures, exercises, and a final project based on students' interests. Students can complete the course without programming by creatively utilizing existing tools like ChatGPT and DALL-E. The course emphasizes collaboration, peer-to-peer tutoring, and project-based learning. It covers topics such as text generation, image generation, optimization, and game AI.
 
            
            start-machine-learning
Start Machine Learning in 2024 is a comprehensive guide for beginners to advance in machine learning and artificial intelligence without any prior background. The guide covers various resources such as free online courses, articles, books, and practical tips to become an expert in the field. It emphasizes self-paced learning and provides recommendations for learning paths, including videos, podcasts, and online communities. The guide also includes information on building language models and applications, practicing through Kaggle competitions, and staying updated with the latest news and developments in AI. The goal is to empower individuals with the knowledge and resources to excel in machine learning and AI.
 
            
            start-llms
This repository is a comprehensive guide for individuals looking to start and improve their skills in Large Language Models (LLMs) without an advanced background in the field. It provides free resources, online courses, books, articles, and practical tips to become an expert in machine learning. The guide covers topics such as terminology, transformers, prompting, retrieval augmented generation (RAG), and more. It also includes recommendations for podcasts, YouTube videos, and communities to stay updated with the latest news in AI and LLMs.
 
            
            ai_gallery
AI Gallery is a showcase site built using React and Nextjs for static site generation, featuring interactive visualizations of classic algorithms, classic games implementation, and various interesting widgets. The project utilizes AI assistance from Claude 3.5 and GPT-4 to create components and enhance the development process. It aims to continually add more components with AI assistance, providing a platform for contributors to leverage AI in frontend development.
 
            
            motleycrew
Motleycrew is an ultimate framework for building multi-agent AI systems, allowing users to mix and match AI agents and tools from popular frameworks, design advanced workflows, and leverage dynamic knowledge graphs with simplicity and elegance. It acts as a conductor orchestrating a symphony of AI agents and tools, providing building blocks for creating AI systems and enabling users to focus on high-level design while taking care of the rest. The framework offers integration with various tools, flexibility in providing agents with tools or other agents, advanced flow design capabilities, and built-in observability and caching features.
 
            
            dstoolkit-text2sql-and-imageprocessing
This repository provides sample code for improving RAG applications with rich data sources including SQL Warehouses and documents analysed with Azure Document Intelligence. It includes components for Text2SQL generation and querying, linking Azure Document Intelligence with AI Search for processing complex documents, and deploying AI search indexes. The plugins and skills aim to enhance response quality in RAG applications by accessing and pulling data from SQL tables, drawing insights from complex charts and images, and intelligently grouping similar sentences.
 
            
            knowledge
Knowledge is a tool for saving, searching, accessing, exploring and chatting with all of your favorite websites, documents and files. Dive into a more interactive learning experience with Knowledge's new Chat feature! Engage in dynamic conversations with your Projects and Sources, leveraging the power of Large Language Models. The Chat feature is designed to transform the way you interact with your data, offering a more engaging and exploratory approach to learning. Unleash the power of context with the built-in Chromium browser. Transform your browsing into knowledge gathering effortlessly.
 
            
            gdx-ai
An artificial intelligence framework entirely written in Java for game development with libGDX. It is a high-performance framework providing common AI techniques used in the game industry, covering movement AI, pathfinding, decision making, and infrastructure. The framework is designed to be used with libGDX but can be used independently. Current features include steering behaviors, formation motion, A* pathfinding, hierarchical pathfinding, behavior trees, state machine, message handling, and scheduling.
 
            
            gpt_academic
GPT Academic is a powerful tool that leverages the capabilities of large language models (LLMs) to enhance academic research and writing. It provides a user-friendly interface that allows researchers, students, and professionals to interact with LLMs and utilize their abilities for various academic tasks. With GPT Academic, users can access a wide range of features and functionalities, including: * **Summarization and Paraphrasing:** GPT Academic can summarize complex texts, articles, and research papers into concise and informative summaries. It can also paraphrase text to improve clarity and readability. * **Question Answering:** Users can ask GPT Academic questions related to their research or studies, and the tool will provide comprehensive and well-informed answers based on its knowledge and understanding of the relevant literature. * **Code Generation and Explanation:** GPT Academic can generate code snippets and provide explanations for complex coding concepts. It can also help debug code and suggest improvements. * **Translation:** GPT Academic supports translation of text between multiple languages, making it a valuable tool for researchers working with international collaborations or accessing resources in different languages. * **Citation and Reference Management:** GPT Academic can help users manage their citations and references by automatically generating citations in various formats and providing suggestions for relevant references based on the user's research topic. * **Collaboration and Note-Taking:** GPT Academic allows users to collaborate on projects and take notes within the tool. They can share their work with others and access a shared workspace for real-time collaboration. * **Customizable Interface:** GPT Academic offers a customizable interface that allows users to tailor the tool to their specific needs and preferences. They can choose from a variety of themes, adjust the layout, and add or remove features to create a personalized workspace. Overall, GPT Academic is a versatile and powerful tool that can significantly enhance the productivity and efficiency of academic research and writing. It empowers users to leverage the capabilities of LLMs and unlock new possibilities for academic exploration and knowledge creation.
 
            
            Electronic-Component-Sorter
The Electronic Component Classifier is a project that uses machine learning and artificial intelligence to automate the identification and classification of electrical and electronic components. It features component classification into seven classes, user-friendly design, and integration with Flask for a user-friendly interface. The project aims to reduce human error in component identification, make the process safer and more reliable, and potentially help visually impaired individuals in identifying electronic components.
 
            
            public
This public repository contains API, tools, and packages for Datagrok, a web-based data analytics platform. It offers support for scientific domains, applications, connectors to web services, visualizations, file importing, scientific methods in R, Python, or Julia, file metadata extractors, custom predictive models, platform enhancements, and more. The open-source packages are free to use, with restrictions on server computational capacities for the public environment. Academic institutions can use Datagrok for research and education, benefiting from reproducible and scalable computations and data augmentation capabilities. Developers can contribute by creating visualizations, scientific methods, file editors, connectors to web services, and more.
 
            
            generative-ai-amazon-bedrock-langchain-agent-example
This repository provides a sample solution for building generative AI agents using Amazon Bedrock, Amazon DynamoDB, Amazon Kendra, Amazon Lex, and LangChain. The solution creates a generative AI financial services agent capable of assisting users with account information, loan applications, and answering natural language questions. It serves as a launchpad for developers to create personalized conversational agents for applications like chatbots and virtual assistants.
 
            
            PythonDataScienceFullThrottle
PythonDataScienceFullThrottle is a comprehensive repository containing various Python scripts, libraries, and tools for data science enthusiasts. It includes a wide range of functionalities such as data preprocessing, visualization, machine learning algorithms, and statistical analysis. The repository aims to provide a one-stop solution for individuals looking to dive deep into the world of data science using Python.
 
            
            foundationallm
FoundationaLLM is a platform designed for deploying, scaling, securing, and governing generative AI in enterprises. It allows users to create AI agents grounded in enterprise data, integrate REST APIs, experiment with various large language models, centrally manage AI agents and their assets, deploy scalable vectorization data pipelines, enable non-developer users to create their own AI agents, control access with role-based access controls, and harness capabilities from Azure AI and Azure OpenAI. The platform simplifies integration with enterprise data sources, provides fine-grain security controls, scalability, extensibility, and addresses the challenges of delivering enterprise copilots or AI agents.
 
            
            foundationallm
FoundationaLLM is a platform designed for deploying, scaling, securing, and governing generative AI in enterprises. It allows users to create AI agents grounded in enterprise data, integrate REST APIs, experiment with large language models, centrally manage AI agents and assets, deploy scalable vectorization data pipelines, enable non-developer users to create their own AI agents, control access with role-based access controls, and harness capabilities from Azure AI and Azure OpenAI. The platform simplifies integration with enterprise data sources, provides fine-grain security controls, load balances across multiple endpoints, and is extensible to new data sources and orchestrators. FoundationaLLM addresses the need for customized copilots or AI agents that are secure, licensed, flexible, and suitable for enterprise-scale production.
 
            
            pwnagotchi
Pwnagotchi is an AI tool leveraging bettercap to learn from WiFi environments and maximize crackable WPA key material. It uses LSTM with MLP feature extractor for A2C agent, learning over epochs to improve performance in various WiFi environments. Units can cooperate using a custom parasite protocol. Visit https://www.pwnagotchi.ai for documentation and community links.
For similar tasks
 
            
            MediaAI
MediaAI is a repository containing lectures and materials for Aalto University's AI for Media, Art & Design course. The course is a hands-on, project-based crash course focusing on deep learning and AI techniques for artists and designers. It covers common AI algorithms & tools, their applications in art, media, and design, and provides hands-on practice in designing, implementing, and using these tools. The course includes lectures, exercises, and a final project based on students' interests. Students can complete the course without programming by creatively utilizing existing tools like ChatGPT and DALL-E. The course emphasizes collaboration, peer-to-peer tutoring, and project-based learning. It covers topics such as text generation, image generation, optimization, and game AI.
For similar jobs
 
            
            facefusion
FaceFusion is a next-generation face swapper and enhancer that allows users to seamlessly swap faces in images and videos, as well as enhance facial features for a more polished and refined look. With its advanced deep learning models, FaceFusion provides users with a wide range of options for customizing their face swaps and enhancements, making it an ideal tool for content creators, artists, and anyone looking to explore their creativity with facial manipulation.
 
            
            forge
Forge is a free and open-source digital collectible card game (CCG) engine written in Java. It is designed to be easy to use and extend, and it comes with a variety of features that make it a great choice for developers who want to create their own CCGs. Forge is used by a number of popular CCGs, including Ascension, Dominion, and Thunderstone.
 
            
            latentbox
Latent Box is a curated collection of resources for AI, creativity, and art. It aims to bridge the information gap with high-quality content, promote diversity and interdisciplinary collaboration, and maintain updates through community co-creation. The website features a wide range of resources, including articles, tutorials, tools, and datasets, covering various topics such as machine learning, computer vision, natural language processing, generative art, and creative coding.
 
            
            fabric
Fabric is an open-source framework for augmenting humans using AI. It provides a structured approach to breaking down problems into individual components and applying AI to them one at a time. Fabric includes a collection of pre-defined Patterns (prompts) that can be used for a variety of tasks, such as extracting the most interesting parts of YouTube videos and podcasts, writing essays, summarizing academic papers, creating AI art prompts, and more. Users can also create their own custom Patterns. Fabric is designed to be easy to use, with a command-line interface and a variety of helper apps. It is also extensible, allowing users to integrate it with their own AI applications and infrastructure.
 
            
            ColorPicker
ColorPicker Max is a powerful and intuitive color selection and manipulation tool that is designed to make working with color easier and more efficient than ever before. With its wide range of features and tools, ColorPicker Max offers an unprecedented level of control and customization over every aspect of color selection and manipulation.
 
            
            ai-notes
Notes on AI state of the art, with a focus on generative and large language models. These are the "raw materials" for the https://lspace.swyx.io/ newsletter. This repo used to be called https://github.com/sw-yx/prompt-eng, but was renamed because Prompt Engineering is Overhyped. This is now an AI Engineering notes repo.
 
            
            Neurite
Neurite is an innovative project that combines chaos theory and graph theory to create a digital interface that explores hidden patterns and connections for creative thinking. It offers a unique workspace blending fractals with mind mapping techniques, allowing users to navigate the Mandelbrot set in real-time. Nodes in Neurite represent various content types like text, images, videos, code, and AI agents, enabling users to create personalized microcosms of thoughts and inspirations. The tool supports synchronized knowledge management through bi-directional synchronization between mind-mapping and text-based hyperlinking. Neurite also features FractalGPT for modular conversation with AI, local AI capabilities for multi-agent chat networks, and a Neural API for executing code and sequencing animations. The project is actively developed with plans for deeper fractal zoom, advanced control over node placement, and experimental features.
 
            
            ScribbleArchitect
ScribbleArchitect is a GUI tool designed for generating images from simple brush strokes or Bezier curves in real-time. It is primarily intended for use in architecture and sketching in the early stages of a project. The tool utilizes Stable Diffusion and ControlNet as AI backbone for the generative process, with IP Adapter support and a library of predefined styles. Users can transfer specific styles to their line work, upscale images for high resolution export, and utilize a ControlNet upscaler. The tool also features a screen capture function for working with external tools like Adobe Illustrator or Inkscape.
 
             
                