
GameSentenceMiner
An All-in-One immersion toolkit for learning Languages through games and other visual media.
Stars: 133

GameSentenceMiner (GSM) is an immersion toolkit designed to assist with language learning through games. It enhances Anki cards with automated audio capture, manual trim options, screenshot capture, multi-line support, and AI translation. Additionally, GSM offers OCR capabilities with easier setup, exclusion zones, two-pass OCR system, consistent audio timing, and support for multiple languages. The tool also features game launcher capabilities for simplifying game setup and launching. Basic requirements include an Anki card creation tool, a method of extracting text from games, and, of course, a game. GSM provides detailed documentation and FAQs to help users understand its functionality and troubleshoot any issues. Users can seek support through the project's Discord channel or by creating issues on the repository.
README:
An application designed to assist with language learning through games.
Short Demo (Watch this first): https://www.youtube.com/watch?v=FeFBL7py6HY
Installation: https://www.youtube.com/watch?v=sVL9omRbGc4
Discord: https://discord.gg/yP8Qse6bb8
GSM significantly enhances your Anki cards with rich contextual information:
-
Automated Audio Capture: Automatically records the voice line associated with the text.
- Automatic Trim: Some simple math around the time that the text event came in, in combination with a "Voice Activation Detection" (VAD) library gives us neatly cut audio.
- Manual Trim: If Automatic voiceline trim is not perfect, it's possible to open the audio in an external program for trimming.
-
Screenshot: Captures a screenshot of the game at the moment the voice line is spoken.
-
Multi-Line: It's possible to capture multiple lines at once with sentence audio with GSM's very own Texthooker.
-
AI Translation: Integrates AI to provide quick translations of the captured sentence. Custom Prompts also supported. (Optional, Bring your own Key)
https://github.com/user-attachments/assets/df6bc38e-d74d-423e-b270-8a82eec2394c
https://github.com/user-attachments/assets/ee670fda-1a8b-4dec-b9e6-072264155c6e
GSM runs a fork of OwOCR to provide accurate text capture from games that do not have a hook. Here are some improvements GSM makes on stock OwOCR:
-
Easier Setup: With GSM's managed Python install, setup is only a matter of clicking a few buttons.
-
Exclusion Zones: Instead of choosing an area to OCR, you can choose an area to exclude from OCR. Useful if you have a static interface in your game and text appears randomly throughout.
-
Two-Pass OCR: To cut down on API calls and keep output clean, GSM features a "Two-Pass" OCR System. A Local OCR will be constantly running, and when the text on screen stabilizes, it will run a second, more accurate scan that gets sent to clipboard/WebSocket.
-
Consistent Audio Timing: With the two-pass system, we can still get accurate audio recorded and into Anki without the use of crazy offsets or hacks.
-
More Language Support: Stock OwOCR is hard-coded to Japanese, while in GSM you can use a variety of languages.
https://github.com/user-attachments/assets/07240472-831a-40e6-be22-c64b880b0d66
This is probably the feature I care least about, but if you are lazy like me, you may find this helpful.
-
Launch: GSM can launch your games directly, simplifying the setup process.
-
Hook: Streamlines the process of hooking your games (Agent).
This feature simplifies the process of launching games and (potentially) hooking them, making the entire workflow more efficient.
-
A method of getting text from the game: Agent, Textractor, LunaTranslator, GSM's OCR, etc.
-
A game :)
For help with installation, setup, and other information, please visit the project's Wiki.
This is a common question, and understanding this process will help clarify any issues you might encounter while using GSM.
-
The beginning of the voice line is marked by a text event. This usually comes from Textractor, Agent, or another texthooker. GSM can listen for a clipboard copy and/or a WebSocket server (configurable in GSM).
-
The end of the voice line is detected using a Voice Activity Detection (VAD) library running locally. (Example)
In essence, GSM relies on accurately timed text events to capture the corresponding audio.
GSM provides settings to accommodate less-than-ideal hooks. However, if you experience significant audio inconsistencies, they likely stem from a poorly timed hook, loud background music, or other external factors, rather than GSM itself. The core audio trimming logic has been stable and effective for many users across various games.
If you encounter issues, please ask for help in my Discord or create an issue here.
-
OwOCR for their outstanding OCR implementation, which I've integrated into GSM.
-
chaiNNer for the idea of installing Python within an Electron app.
If you've found this or any of my other projects helpful, please consider supporting my work through GitHub Sponsors, or Ko-fi.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for GameSentenceMiner
Similar Open Source Tools

GameSentenceMiner
GameSentenceMiner (GSM) is an immersion toolkit designed to assist with language learning through games. It enhances Anki cards with automated audio capture, manual trim options, screenshot capture, multi-line support, and AI translation. Additionally, GSM offers OCR capabilities with easier setup, exclusion zones, two-pass OCR system, consistent audio timing, and support for multiple languages. The tool also features game launcher capabilities for simplifying game setup and launching. Basic requirements include an Anki card creation tool, a method of extracting text from games, and, of course, a game. GSM provides detailed documentation and FAQs to help users understand its functionality and troubleshoot any issues. Users can seek support through the project's Discord channel or by creating issues on the repository.

persian-license-plate-recognition
The Persian License Plate Recognition (PLPR) system is a state-of-the-art solution designed for detecting and recognizing Persian license plates in images and video streams. Leveraging advanced deep learning models and a user-friendly interface, it ensures reliable performance across different scenarios. The system offers advanced detection using YOLOv5 models, precise recognition of Persian characters, real-time processing capabilities, and a user-friendly GUI. It is well-suited for applications in traffic monitoring, automated vehicle identification, and similar fields. The system's architecture includes modules for resident management, entrance management, and a detailed flowchart explaining the process from system initialization to displaying results in the GUI. Hardware requirements include an Intel Core i5 processor, 8 GB RAM, a dedicated GPU with at least 4 GB VRAM, and an SSD with 20 GB of free space. The system can be installed by cloning the repository and installing required Python packages. Users can customize the video source for processing and run the application to upload and process images or video streams. The system's GUI allows for parameter adjustments to optimize performance, and the Wiki provides in-depth information on the system's architecture and model training.

ShortGPT
ShortGPT is a powerful framework for automating content creation, simplifying video creation, footage sourcing, voiceover synthesis, and editing tasks. It offers features like automated editing framework, scripts and prompts, voiceover support in multiple languages, caption generation, asset sourcing, and persistency of editing variables. The tool is designed for youtube automation, Tiktok creativity program automation, and offers customization options for efficient and creative content creation.

nextpy
Nextpy is a cutting-edge software development framework optimized for AI-based code generation. It provides guardrails for defining AI system boundaries, structured outputs for prompt engineering, a powerful prompt engine for efficient processing, better AI generations with precise output control, modularity for multiplatform and extensible usage, developer-first approach for transferable knowledge, and containerized & scalable deployment options. It offers 4-10x faster performance compared to Streamlit apps, with a focus on cooperation within the open-source community and integration of key components from various projects.

AgentPilot
Agent Pilot is an open source desktop app for creating, managing, and chatting with AI agents. It features multi-agent, branching chats with various providers through LiteLLM. Users can combine models from different providers, configure interactions, and run code using the built-in Open Interpreter. The tool allows users to create agents, manage chats, work with multi-agent workflows, branching workflows, context blocks, tools, and plugins. It also supports a code interpreter, scheduler, voice integration, and integration with various AI providers. Contributions to the project are welcome, and users can report known issues for improvement.

noScribe
noScribe is an AI-based software designed for automated audio transcription, specifically tailored for transcribing interviews for qualitative social research or journalistic purposes. It is a free and open-source tool that runs locally on the user's computer, ensuring data privacy. The software can differentiate between speakers and supports transcription in 99 languages. It includes a user-friendly editor for reviewing and correcting transcripts. Developed by Kai Dröge, a PhD in sociology with a background in computer science, noScribe aims to streamline the transcription process and enhance the efficiency of qualitative analysis.

OpenCAGE
OpenCAGE is an open-source modding toolkit for Alien: Isolation, enabling custom scripting, configuration, and content modification through graphical interfaces. It includes tools for editing assets, configurations, scripts, behaviour trees, launching the game, and managing backups. The project is constantly evolving with a roadmap that includes features like contextual script editing, content porter, new level creator, mod installers, 3D viewer improvements, navmesh generation, skinned meshes support, sound import/export, and more. OpenCAGE is supported financially by the community and welcomes code contributions.

doc2plan
doc2plan is a browser-based application that helps users create personalized learning plans by extracting content from documents. It features a Creator for manual or AI-assisted plan construction and a Viewer for interactive plan navigation. Users can extract chapters, key topics, generate quizzes, and track progress. The application includes AI-driven content extraction, quiz generation, progress tracking, plan import/export, assistant management, customizable settings, viewer chat with text-to-speech and speech-to-text support, and integration with various Retrieval-Augmented Generation (RAG) models. It aims to simplify the creation of comprehensive learning modules tailored to individual needs.

oreilly-retrieval-augmented-gen-ai
This repository focuses on Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs). It provides code and resources to augment LLMs with real-time data for dynamic, context-aware applications. The content covers topics such as semantic search, fine-tuning embeddings, building RAG chatbots, evaluating LLMs, and using knowledge graphs in RAG. Prerequisites include Python skills, knowledge of machine learning and LLMs, and introductory experience with NLP and AI models.

krita-ai-diffusion
Krita-AI-Diffusion is a plugin for Krita that allows users to generate images from within the program. It offers a variety of features, including inpainting, outpainting, generating images from scratch, refining existing content, live painting, and control over image creation. The plugin is designed to fit into an interactive workflow where AI generation is used as just another tool while painting. It is meant to synergize with traditional tools and the layer stack.

kitops
KitOps is a packaging and versioning system for AI/ML projects that uses open standards so it works with the AI/ML, development, and DevOps tools you are already using. KitOps simplifies the handoffs between data scientists, application developers, and SREs working with LLMs and other AI/ML models. KitOps' ModelKits are a standards-based package for models, their dependencies, configurations, and codebases. ModelKits are portable, reproducible, and work with the tools you already use.

languine
Languine is a CLI tool powered by AI that helps developers streamline the localization process by providing AI-powered translations, automation features, consistent localization, developer-centric design, and time-saving workflows. It automates the identification of translation keys, supports multiple file formats, delivers accurate translations in over 100 languages, aligns translations with the original text's tone and intent, extracts translation keys from codebase, and supports hooks for content formatting with Biome or Prettier. Languine is designed to simplify and enhance the localization experience for developers.

stride-gpt
STRIDE GPT is an AI-powered threat modelling tool that leverages Large Language Models (LLMs) to generate threat models and attack trees for a given application based on the STRIDE methodology. Users provide application details, such as the application type, authentication methods, and whether the application is internet-facing or processes sensitive data. The model then generates its output based on the provided information. It features a simple and user-friendly interface, supports multi-modal threat modelling, generates attack trees, suggests possible mitigations for identified threats, and does not store application details. STRIDE GPT can be accessed via OpenAI API, Azure OpenAI Service, Google AI API, or Mistral API. It is available as a Docker container image for easy deployment.

pyqt-openai
VividNode is a cross-platform AI desktop chatbot application for LLM such as GPT, Claude, Gemini, Llama chatbot interaction and image generation. It offers customizable features, local chat history, and enhanced performance without requiring a browser. The application is powered by GPT4Free and allows users to interact with chatbots and generate images seamlessly. VividNode supports Windows, Mac, and Linux, securely stores chat history locally, and provides features like chat interface customization, image generation, focus and accessibility modes, and extensive customization options with keyboard shortcuts for efficient operations.

bmf
BMF (Babit Multimedia Framework) is a cross-platform, multi-language, customizable multimedia processing framework developed by ByteDance. It offers native compatibility with Linux, Windows, and macOS, Python, Go, and C++ APIs, and high performance with strong GPU acceleration. BMF allows developers to enhance its features independently and provides efficient data conversion across popular frameworks and hardware devices. BMFLite is a client-side lightweight framework used in apps like Douyin/Xigua, serving over one billion users daily. BMF is widely used in video streaming, live transcoding, cloud editing, and mobile pre/post processing scenarios.

MagicMirror
MagicMirror is an AI-powered tool that allows users to instantly try on new faces, hairstyles, and outfits with a simple drag and drop interface. It runs smoothly on standard computers without the need for dedicated GPU hardware, ensuring privacy with completely offline processing. The tool is ultra-lightweight with a small installer size and model files, providing a fun and easy way to experiment with different looks.
For similar tasks

GameSentenceMiner
GameSentenceMiner (GSM) is an immersion toolkit designed to assist with language learning through games. It enhances Anki cards with automated audio capture, manual trim options, screenshot capture, multi-line support, and AI translation. Additionally, GSM offers OCR capabilities with easier setup, exclusion zones, two-pass OCR system, consistent audio timing, and support for multiple languages. The tool also features game launcher capabilities for simplifying game setup and launching. Basic requirements include an Anki card creation tool, a method of extracting text from games, and, of course, a game. GSM provides detailed documentation and FAQs to help users understand its functionality and troubleshoot any issues. Users can seek support through the project's Discord channel or by creating issues on the repository.

Google-Shortcuts-Launcher
Google Shortcuts Launcher provides a seamless way to integrate powerful Google services into your daily workflow. With just a tap, you can quickly access a variety of shortcuts designed to enhance your daily device use and simplify your interactions with Google features. It offers shortcuts for games launcher, Google Lens, Google Music Search, Google Password Manager, Google Weather, and Voice Assistant. The tool requires Google, Google Play Services, and Google Play Games to be installed on the device for proper functionality, and some features may require root access.

Verbiverse
Verbiverse is a tool that uses a large language model to assist in reading PDFs and watching videos, aimed at improving language proficiency. It provides a more convenient and efficient way to use large models through predefined prompts, designed for those looking to enhance their language skills. The tool analyzes unfamiliar words and sentences in foreign language PDFs or video subtitles, providing better contextual understanding compared to traditional dictionary translations or ambiguous meanings. It offers features such as automatic loading of subtitles, word analysis by clicking or double-clicking, and a word database for collecting words. Users can run the tool on Windows x86_64 or ubuntu_22.04 x86_64 platforms by downloading the precompiled packages or by cloning the source code and setting up a virtual environment with Python. It is recommended to use a local model or smaller PDF files for testing due to potential token consumption issues with large files.
For similar jobs

AMchat
AMchat is a large language model that integrates advanced math concepts, exercises, and solutions. The model is based on the InternLM2-Math-7B model and is specifically designed to answer advanced math problems. It provides a comprehensive dataset that combines Math and advanced math exercises and solutions. Users can download the model from ModelScope or OpenXLab, deploy it locally or using Docker, and even retrain it using XTuner for fine-tuning. The tool also supports LMDeploy for quantization, OpenCompass for evaluation, and various other features for model deployment and evaluation. The project contributors have provided detailed documentation and guides for users to utilize the tool effectively.

duolingo-clone
Lingo is an interactive platform for language learning that provides a modern UI/UX experience. It offers features like courses, quests, and a shop for users to engage with. The tech stack includes React JS, Next JS, Typescript, Tailwind CSS, Vercel, and Postgresql. Users can contribute to the project by submitting changes via pull requests. The platform utilizes resources from CodeWithAntonio, Kenney Assets, Freesound, Elevenlabs AI, and Flagpack. Key dependencies include @clerk/nextjs, @neondatabase/serverless, @radix-ui/react-avatar, and more. Users can follow the project creator on GitHub and Twitter, as well as subscribe to their YouTube channel for updates. To learn more about Next.js, users can refer to the Next.js documentation and interactive tutorial.

Verbiverse
Verbiverse is a tool that uses a large language model to assist in reading PDFs and watching videos, aimed at improving language proficiency. It provides a more convenient and efficient way to use large models through predefined prompts, designed for those looking to enhance their language skills. The tool analyzes unfamiliar words and sentences in foreign language PDFs or video subtitles, providing better contextual understanding compared to traditional dictionary translations or ambiguous meanings. It offers features such as automatic loading of subtitles, word analysis by clicking or double-clicking, and a word database for collecting words. Users can run the tool on Windows x86_64 or ubuntu_22.04 x86_64 platforms by downloading the precompiled packages or by cloning the source code and setting up a virtual environment with Python. It is recommended to use a local model or smaller PDF files for testing due to potential token consumption issues with large files.

AnnA_Anki_neuronal_Appendix
AnnA is a Python script designed to create filtered decks in optimal review order for Anki flashcards. It uses Machine Learning / AI to ensure semantically linked cards are reviewed far apart. The script helps users manage their daily reviews by creating special filtered decks that prioritize reviewing cards that are most different from the rest. It also allows users to reduce the number of daily reviews while increasing retention and automatically identifies semantic neighbors for each note.

EngAce
EngAce is a cutting-edge, generative AI-powered application revolutionizing Vietnamese English learning. It offers personalized learning experiences combining AI with comprehensive features. The repository contains source code, documentation, and resources for the app.

TheoremExplainAgent
TheoremExplainAgent is an AI system that generates long-form Manim videos to visually explain theorems, proving its deep understanding while uncovering reasoning flaws that text alone often hides. The codebase for the paper 'TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding' is available in this repository. It provides a tool for creating multimodal explanations for theorem understanding using AI technology.

vocabulary-book-by-deepseek
Vocabulary Book by DeepSeek is a manual for CET-4, postgraduate entrance examination, and TOEFL vocabulary, providing word meanings, roots, example sentences, mnemonic aids, and mnemonic images. The project uses Cline + DeepSeek-R1-16b for over 80% of the code to automatically encode the vocabulary manual. The generated manual includes vocabulary from A to Z for CET-4, CET-6, postgraduate entrance examination, and TOEFL, along with features to generate Anki cards and PDFs. The tool also allows for the creation of mnemonic images for each word and articles.

awesome-ai-llm4education
The 'awesome-ai-llm4education' repository is a curated list of papers related to artificial intelligence (AI) and large language models (LLM) for education. It collects papers from top conferences, journals, and specialized domain-specific conferences, categorizing them based on specific tasks for better organization. The repository covers a wide range of topics including tutoring, personalized learning, assessment, material preparation, specific scenarios like computer science, language, math, and medicine, aided teaching, as well as datasets and benchmarks for educational research.