podscript

podscript

Generate podcast transcripts using language and speech-to-text models

Stars: 149

Visit
 screenshot

Podscript is a tool designed to generate transcripts for podcasts and similar audio files using Language Model Models (LLMs) and Speech-to-Text (STT) APIs. It provides a command-line interface (CLI) for transcribing audio from various sources, including YouTube videos and audio files, using different speech-to-text services like Deepgram, Assembly AI, and Groq. Additionally, Podscript offers a web-based user interface for convenience. Users can configure keys for supported services, transcribe audio, and customize the transcription models. The tool aims to simplify the process of creating accurate transcripts for audio content.

README:

podscript

podscript is a tool to generate transcripts for podcasts (and other similar audio files), using LLMs and Speech-to-Text (STT) APIs.

Install

> go install github.com/deepakjois/podscript@latest

> ~/go/bin/podscript --help

Web UI

Podscript has a web based UI for convenience

> podscript web
Starting server on port 8080

This runs a web server on at http://localhost:8080

Demo

For more advanced usage, see the CLI section below.

CLI Getting started

# Configure keys for supported services (OpenAI, Anthropic, Deepgram etc)
# and write them to $HOME/.podscript.toml
podscript configure

# Transcribe a YouTube Video by formatting and cleaning up autogenerated captions
podscript ytt https://www.youtube.com/watch?v=aO1-6X_f74M

# Transcribe audio from a URL using deepgram speech-to-text API
#
# Deepgram and AssemblyAI subcommands support `--from-url` for
# passing audio URLs, and `--from-file` to pass audio files.
podscript deepgram --from-url  https://audio.listennotes.com/e/p/d6cc86364eb540c1a30a1cac2b77b82c/

# Transcribe audio from a file using Groq's whisper model
#  Groq only supports audio files.
podscript groq --file huberman.mp3

More Info

Models for ytt subcommand

The ytt subommand uses the gpt-4o model by default. Use --model flag to set a different model. The following are supported:

  • OpenAI
    • gpt-4o
    • gpt-4o-mini
  • Google Gemini
    • gemini-2.0-flash
  • Llama (via Groq)
    • llama-3.3-70b-versatile
    • llama-3.1-8b-instant
  • Anthropic
    • claude-3-5-sonnet-20241022
    • claude-3-5-haiku-20241022
  • Anthropic via Amazon Bedrock
    • anthropic.claude-3-5-sonnet-20241022-v2:0 (via AWS)
    • anthropic.claude-3-5-haiku-20241022-v1:0 (via AWS)

Transcript from audio URLs and files

[!TIP] You can find the audio download link for a podcast on ListenNotes under the More menu

image

podscript supports the following Speech-To-Text (STT) APIs:

  • Deepgram (which as of Jan 2025 provides $200 free signup credit!)
  • Assembly AI (which as of Oct 2024 is free to use within your credit limits and they provide $50 credits free on signup).
  • Groq (which as of Jul 2024 is in beta and free to use within your rate limits).

Development

Want to contribute? Here's how to build and run the project locally:

Prerequisites

Build and run the frontend:

cd web/frontend
npm run dev

Build the backend server and run it in dev mode:

go build -o podscript
./podscript web --dev

This will start the backend server and expose only the API endpoints without bundling the frontend assets

To connect the two:

cd web
caddy run

This should setup everything such that you can visit http://localhost:8080 and have the frontend connected to the backend via the Caddy reverse proxy

Feedback

Feel free to drop me a note on X or Email Me

License

MIT

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for podscript

Similar Open Source Tools

For similar tasks

For similar jobs