ai-that-works

ai-that-works

🦄 ai that works - every tuesday 10 AM PST

Stars: 550

Visit
 screenshot

AI That Works is a weekly live coding event focused on exploring advanced AI engineering techniques and tools to take AI applications from demo to production. The sessions cover topics such as token efficiency, downsides of JSON, writing drop-ins for coding agents, and advanced tricks like .shims for coding tools. The event aims to help participants get the most out of today's AI models and tools through live coding, Q&A sessions, and production-ready insights.

README:

🦄 AI That Works

On Zoom, Tuesdays at 10 AM PST - an hour of live coding, Q&A, and production-ready AI engineering

Event Calendar Discord YouTube Playlist

🦄 Next Episode

Bash vs. MCP - token efficient coding agent tooling

Tuesday, September 16, 2025 at 10 AM PST

On this week's AI That Works, we'll explore the great Bash vs. MCP debate - what's better for helping coding agents do more?

We'll talk about:

  • Token efficiency and Downsides of JSON
  • Writing your own drop-ins for MCP tools
  • Other advanced tricks like .shims for forcing uv instead of python or bun instead of npm

Register Now


What We're About

Weekly conversations with @hellovai & @dexhorthy about getting the most juice out of today's models

When: Every Tuesday at 10 AM PST on Zoom
Duration: 1 hour of live coding, Q&A, and production-ready insights
Goal: Take your AI app from demo → production

Let's code together.

Pre-Reading & Setup

Before joining, get familiar with our toolkit:

Core Tools

  • Zoom - Live sessions
  • Cursor - AI-powered IDE
  • Git - Version control
  • Claude Code - Agentic Coding
  • CodeLayer - Agentic Coding Tool

Languages

Package Managers

  • Python: UV
  • TypeScript: PNPM
  • Go: Go modules

Episodes & Workshops

From Demo to Production - One Episode at a Time

📅 Episode 📝 Description
UPCOMING
2025-09-16
#23: Bash vs. MCP - token efficient coding agent tooling
code • register
On this week's AI That Works, we'll explore the great Bash vs. MCP debate - what's better for helping coding agents do more?

We'll talk about:

  • Token efficiency and Downsides of JSON
  • Writing your own drop-ins for MCP tools
  • Other advanced tricks like .shims for forcing uv instead of python or bun instead of npm
PAST
2025-09-09
#22: Generative UIs and Structured Streaming
watch • code
We'll explore hard problems in building rich UIs that rely on streaming data from LLMs. ​Specifically, we'll talk through techniques for rendering **STRUCTURED** outputs from LLMs, with real-world examples of how to handle partially-streamed outputs over incomplete JSON data. We'll explore advanced needs like * Fields that should be required for stream to start * ​Rendering React Components with partial data ​* Handling nullable fields vs. yet-to-be-streamed fields * ​Building high-quality User feedback * ​Handling errors mid-stream
PAST
2025-09-02
#21: Voice Agents and Supervisor Threading
watch • code
Exploring voice-based AI agents and supervisor threading patterns for managing complex conversational workflows.
PAST
2025-08-26
#20: Claude for Non-Code Tasks
watch • code
On #17 we talked about advanced context engineering workflows for using Claude code to work in complex codebases. This week, we're gonna get a little weird with it, and show off a bunch of ways you can use Claude Code as a generic agent to handle non-coding tasks. We'll learn things like: Skipping the MCP and having claude write its own scripts to interact with external systems, Creating internal knowledge graphs with markdown files, How to blend agentic retrieval and search with deterministic context packing
PAST
2025-08-19
#19: Interruptible Agents
watch • code
Anyone can build a chatbot, but the user experience is what truly sets it apart. Can you cancel a message? Can you queue commands while it's busy? How finely can you steer the agent? We'll explore these questions and code a solution together.
PAST
2025-08-12
#18: Decoding Context Engineering Lessons from Manus
watch • code
A few weeks ago, the Manus team published an excellent paper on context engineering. It covered KV Cache, Hot-swapping tools with custom samplers, and a ton of other cool techniques. On this week's episode, we'll dive deep on the manus Article and put some of the advice into practice, exploring how a deep understanding of models and inference can help you to get the most out of today's LLMs.
PAST
2025-08-05
#17: Context Engineering for Coding Agents
watch • code
By popular demand, AI That Works #17 will dive deep on a new kind of context engineering: managing research, specs, and planning to get the most of coding agents and coding CLIs. You've heard people bragging about spending thousands/mo on Claude Code, maxing out Amp limits, and much more. Now Dex and Vaibhav are gonna share some tips and tricks for pushing AI coding tools to their absolute limits, while still shipping well-tested, bug-free code. This isn't vibe-coding, this is something completely different.
PAST
2025-07-29
#16: Evaluating Prompts Across Models
watch • code
AI That Works #16 will be a super-practical deep dive into real-world examples and techniques for evaluating a single prompt against multiple models. While this is a commonly heralded use case for Evals, e.g. 'how do we know if the new model is better' / 'how do we know if the new model breaks anything', there's not a ton of practical examples out there for real-world use cases.
PAST
2025-07-22
#15: PDFs, Multimodality, Vision Models
watch • code
Dive deep into practical PDF processing techniques for AI applications. We'll explore how to extract, parse, and leverage PDF content effectively in your AI workflows, tackling common challenges like layout preservation, table extraction, and multi-modal content handling.
PAST
2025-07-15
#14: Implementing Decaying-Resolution Memory
watch • code
Last week on #13, we did a conceptual deep dive on context engineering and memory - this week, we're going to jump right into the weeds and implement a version of Decaying-Resolution Memory that you can pick up and apply to your AI Agents today. For this episode, you'll probably want to check out episode #13 in the session listing to get caught up on DRM and why its worth building from scratch.
PAST
2025-07-08
#13: Building AI with Memory & Context
watch • code
How do we build agents that can remember past conversations and learn over time? We'll explore memory and context engineering techniques to create AI systems that maintain state across interactions.
PAST
2025-07-01
#12: Boosting AI Output Quality
watch • code
This week's session was a bit meta! We explored 'Boosting AI Output Quality' by building the very AI pipeline that generated this email from our Zoom recording. The real breakthrough: separating extraction from polishing for high-quality AI generation.
PAST
2025-06-24
#11: Building an AI Content Pipeline
watch • code
Content creation involves a lot of manual work - uploading videos, sending emails, and other follow-up tasks that are easy to drop. We'll build an agent that integrates YouTube, email, GitHub and human-in-the-loop to fully automate the AI that Works content pipeline, handling all the repetitive work while maintaining quality.
PAST
2025-06-17
#10: Entity Resolution: Extraction, Deduping, and Enriching
watch • code
Disambiguating many ways of naming the same thing (companies, skills, etc.) - from entity extraction to resolution to deduping. We'll explore breaking problems into extraction → resolution → enrichment stages, scaling with two-stage designs, and building async workflows with human-in-loop patterns for production entity resolution systems.
PAST
2025-06-10
#9: Cracking the Prompting Interview
watch • code
Ready to level up your prompting skills? Join us for a deep dive into advanced prompting techniques that separate good prompt engineers from great ones. We'll cover systematic prompt design, testing tools / inner loops, and tackle real-world prompting challenges. Perfect prep for becoming a more effective AI engineer.
PAST
2025-06-03
#8: Humans as Tools: Async Agents and Durable Execution
watch • code
Agents are great, but for the most accuracy-sensitive scenarios, we some times want a human in the loop. Today we'll discuss techniques for how to make this possible. We'll dive deep into concepts from our 4/22 session on 12-factor agents and extend them to handle asynchronous operations where agents need to contact humans for help, feedback, or approvals across a variety of channels.
PAST
2025-05-27
#7: 12-factor agents: selecting from thousands of MCP tools
watch • code
MCP is only as great as your ability to pick the right tools. We'll dive into showing how to leverage MCP servers and accurately use the right ones when only a few have actually relevant tools.
PAST
2025-05-20
#6: Policy to Prompt: Evaluating w/ the Enron Emails Dataset
watch • code
One of the most common problems in AI engineering is looking at a set of policies/rules and evaluating evidence to determine if the rules were followed. In this session we'll explore turning policies into prompts and pipelines to evaluate which emails in the massive Enron email dataset violated SEC and Sarbanes-Oxley regulations.
PAST
2025-05-17
SF Workshop: Workshop SF – Twelve Factor Agents
Live workshop in San Francisco on building 12 factor agents. Interactive instruction, code-along format, and hackathon to build production-ready AI agents.
PAST
2025-05-13
#5: Designing Evals
watch • code
Minimalist and high-performance testing/evals for LLM applications. Stay tuned for our season 2 kickoff topic on testing and evaluation strategies.
PAST
2025-05-10
NYC Workshop: Workshop NYC – Twelve Factor Agents
Live workshop in NYC on building 12 factor agents. Interactive instruction, code-along format, and hackathon to build production-ready AI agents.
PAST
2025-04-22
#4: Twelve Factor Agents
watch • code
Learn how to build production-ready AI agents using the twelve-factor methodology. We'll cover the core concepts and build a real agent from scratch.
PAST
2025-04-15
#3: Code Generation with Small Models
watch • code
Large models can do a lot, but so can small models. We'll discuss techniques for how to leverage extremely small models for generating diffs and making changes in complete codebases.
PAST
2025-04-08
#2: Reasoning Models vs Reasoning Prompts
watch • code
Models can reason but you can also reason within a prompt. Which technique wins out when and why? We'll find out by adding reasoning to an existing movie chat agent.
PAST
2025-03-31
#1: Large Scale Classification
watch • code
LLMs are great at classification from 5, 10, maybe even 50 categories. But how do we deal with situations when we have over 1000? Perhaps it's an ever changing list of categories?

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for ai-that-works

Similar Open Source Tools

For similar tasks

For similar jobs