
InterPilot
内测用户反映它可以作为不被检测到的AI面试助手—InterPilot captures audio from Windows input and output devices, transcribes the audio into text, and then calls an LLM API to generate responses. Some beta users have reported that InterPilot can help/assist with interviews and even cheat.
Stars: 88

InterPilot is an AI-based assistant tool that captures audio from Windows input/output devices, transcribes it into text, and then calls the Large Language Model (LLM) API to provide answers. The project includes recording, transcription, and AI response modules, aiming to provide support for personal legitimate learning, work, and research. It may assist in scenarios like interviews, meetings, and learning, but it is strictly for learning and communication purposes only. The tool can hide its interface using third-party tools to prevent screen recording or screen sharing, but it does not have this feature built-in. Users bear the risk of using third-party tools independently.
README:
本项目是一个基于 AI 的助手工具,能够从windows的输入输出设备中捕获音频,将音频转为文字后,再调用 LLM(大语言模型) API 给出回答。项目主要包括录音、转写和 AI 回答三个模块,旨在为个人的正当学习、工作、科研提供辅助支持。
部分内测用户反映,本工具可能可以在面试、会议、学习等场景中提供一定的帮助,比如在在线会议软件中作为AI面试工具辅助面试:获取面试官的音频然后得到回答,但是请注意:本工具仅供学习交流使用,不得用于任何不正当用途。
经测试,本工具能够借助第三方工具隐藏界面以防止被录屏软件、屏幕共享等功能录制到,但工具本身不具备隐藏界面的功能。是否使用第三方工具与作者无关,风险由用户自行承担。
如果对你有所帮助,可以通过微信扫码打赏,感谢你的支持!
-
音频捕获
使用 LoopbackRecorder 从系统录制音频(支持 loopback 设备),并保存为 WAV 文件。 -
语音转写
基于 Whisper 模型在本地进行音频转写,支持多种模型规格(默认使用base
模型)。 -
AI 辅助回答
通过调用 LLM API(配置在config.ini
中)对转写后的文本进行分析,生成回答。支持流式返回并实时更新界面。 -
图形用户界面
基于 PyQt5 构建的简洁 GUI,支持录音、转写、发送文本至 LLM 等操作,并对 LLM 回复支持 Markdown 渲染。
C:.
│ config.ini
│ logo.png
│ main.py
| main_cmd.py
| README.md
│ requirements.txt
│
├── output
└── src
│ audio_capture.py
│ llm_client.py
│ transcriber.py
│ __init__.py
│
└── utils
│ config_loader.py
│ __init__.py
-
config.ini
配置文件,包含 API 接口地址、API key、使用的模型、设备索引、默认提示词等参数。 -
logo.png
应用程序图标(用于 GUI 窗口)。 -
main.py/main_cmd.py
程序入口,负责启动图形界面和整体工作流程。 -
output/
存放录音文件。 -
requirements.txt
列出项目依赖的 Python 包(例如 PyQt5、markdown2、whisper、openai 等)。 -
src/
存放核心模块:-
audio_capture.py
:音频录制模块。 -
transcriber.py
:语音转写模块。 -
llm_client.py
:调用 LLM API 的客户端。 -
utils/
:包含一些工具类和配置加载模块。
-
-
FFmpeg
本项目依赖 FFmpeg 进行部分音频处理,请确保已正确安装并配置环境变量。-
安装方法示例:
- Windows 用户:
- Mac 用户可使用 Homebrew 安装:
brew install ffmpeg
- whisper项目提到
You may need rust installed as well
,所以需要可能安装rust(但不安装好像没事儿,建议先不装,如果transcriber.py
不能正常运行再参考Whisper )
-
安装方法示例:
建议使用miniconda或者anaconda创建虚拟环境(建议安装 Python 3.10
版本):
conda create -n interview python=3.10
conda activate interview
然后使用以下命令安装项目所需依赖:
pip install -r requirements.txt
请根据实际情况修改根目录下的 config.ini
文件,其中包括:
- API_URL:LLM API 的地址。
- API_KEY:访问 API 的密钥。
-
MODEL:调用的模型名称(例如
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
,其他模型名称可以访问硅基流动(官网链接)-模型广场查看。 - SPEAKER_DEVICE_INDEX 与 MIC_DEVICE_INDEX:录音设备的索引,视具体系统配置而定。建议阅读录音设备索引和注意事项部分。
- OUTPUT_DIR:存储录音文件的目录。
-
WHISPER_MODEL_SIZW:whisper模型的大小,可选项为tiny
base
、small
、medium
、large
、turbo
。 - DEFAULT_PROMPT:是拼接在发送给 LLM 的文本最前端的默认提示词,可根据使用场景调整,例如“你是一个XX方面的专家,你马上获取到的文本来自于XX,请你据此给出合理简洁的回答:”
- 建议注册硅基流动(官网链接)获取
API_KEY
,新用户受邀可获取14元额度(邀请码TzKmtDJH
),足够用一段时间了 - 官网左侧菜单栏-API秘钥-新建API秘钥-获取一段形如
sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
的长字符串替换config.ini
里的API_KEY
即可 -
使用其他支持OpenAI API的服务也可以,只需替换
API_URL
和API_KEY
即可(还是建议使用siliconflow,工具默认使用的deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
模型完全免费,白嫖万岁!)
- 默认
SPEAKER_DEVICE_INDEX
置为了-1,这会自动寻找可用的默认wasapi_loopback设备,一般录制的就是你的目前的扬声器(耳机)听到的声音,但如果出现问题,建议手动运行audio_capture.py
查看全部可用设备后,手动指定正确的设备。你也可以通过修改这个参数使得录制的是麦克风输入的声音。
python src/audio_capture.py
项目各核心模块(录音、转写、LLM 客户端)均包含简单的测试代码。你可以分别运行下列文件,检查各功能模块是否正常运行:
-
src/audio_capture.py
—— 用于实现音频录制功能(能够打印出系统中的音频设备列表)。 -
src/transcriber.py
—— 用于实现音频转写功能(首次运行会自动下载模型)。 -
src/llm_client.py
—— 用于实现 LLM 客户端功能(调用 LLM API 并返回回答)。
运行 main.py
启动完整的面试助手 GUI:
python main.py
在 GUI 中你可以依次进行以下操作:
- 开始录音:点击“开始录音”按钮,程序将自动生成唯一的录音文件名并开始录制音频。
-
结束录音:点击“结束录音”按钮结束录音,录音文件保存在
output
目录中。 - 转写文字:录音结束后(或手动点击),调用转写模块,将录音转为文字并显示在界面上。
- 发送给 LLM:转写完成后,可以将文字发送至 LLM,生成 AI 回答,并在界面上显示支持 Markdown 格式的回复。
- 修改转写文字并发送给 LLM
如果你想在终端中运行,可以使用 main_cmd.py
:
python main_cmd.py
-
录音设备:根据设备不同,可能需要调整
config.ini
中的SPEAKER_DEVICE_INDEX
和MIC_DEVICE_INDEX
参数。 默认设置下,因为录制的是扬声器(你听到)的声音,所以在没有声音播放的时候,是不会录制的,所以必须播放一些音频或者视频,才能获取到音频。测试的时候可以放个视频。 - 环境变量:确保 FFmpeg 已安装并已添加到环境变量 PATH 中,否则可能会影响音频处理。
- 测试验证:建议先单独测试各模块,确认音频录制、转写和 LLM 回答均正常后再启动 GUI 整体运行。
使用shalzuth/WindowSharingHider隐藏UI界面————太棒的工具了!又方便又好用!
任务栏中图表的隐藏:
- 直接使用windows自带的任务栏隐藏功能,或者干脆把任务栏移到第二个显示器
- 使用一些隐藏工具(可以自己找一下)
使用turbotop可以使得窗口始终置顶————也是很好用的工具
-
注意一下使用顺序不然可能会出现问题:
- 先使用turbotop使得窗口置顶
- 再使用WindowSharingHider隐藏UI界面
- 如果不太就行就换一下顺序多试几下
- [ ] 在README中增加详细的使用案例或截图(GUI 操作示例、终端输出示例等)。
- [ ] 增加voice_generate功能(TTS)——已经测试好,待集成
- [ ] 增加麦克风扬声器音频共同识别功能
- [ ] 增加截图、上传LLM功能
- [ ] 任务栏中的图标隐藏功能
欢迎社区开发者提交 issue 或 pull request,一起完善这个 工具。如果有任何建议或改进意见,请随时联系。
本项目仅供技术学习与研究交流之用,严禁用于以下用途:
- 任何形式的求职面试作弊行为
- 侵犯他人隐私或商业秘密
- 违反当地法律法规的行为
使用者应对自身行为负全部法律责任,作者不承担任何因滥用本项目导致的直接或间接后果。使用即表示您已阅读并同意本声明。
本项目采用 Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) 许可证进行开源。
这意味着您可以自由地共享和修改本项目的内容,但仅限于非商业用途。
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for InterPilot
Similar Open Source Tools

InterPilot
InterPilot is an AI-based assistant tool that captures audio from Windows input/output devices, transcribes it into text, and then calls the Large Language Model (LLM) API to provide answers. The project includes recording, transcription, and AI response modules, aiming to provide support for personal legitimate learning, work, and research. It may assist in scenarios like interviews, meetings, and learning, but it is strictly for learning and communication purposes only. The tool can hide its interface using third-party tools to prevent screen recording or screen sharing, but it does not have this feature built-in. Users bear the risk of using third-party tools independently.

TrainPPTAgent
TrainPPTAgent is an AI-based intelligent presentation generation tool. Users can input a topic and the system will automatically generate a well-structured and content-rich PPT outline and page-by-page content. The project adopts a front-end and back-end separation architecture: the front-end is responsible for interaction, outline editing, and template selection, while the back-end leverages large language models (LLM) and reinforcement learning (GRPO) to complete content generation and optimization, making the generated PPT more tailored to user goals.

RTXZY-MD
RTXZY-MD is a bot tool that supports file hosting, QR code, pairing code, and RestApi features. Users must fill in the Apikey for the bot to function properly. It is not recommended to install the bot on platforms lacking ffmpeg, imagemagick, webp, or express.js support. The tool allows for 95% implementation of website api and supports free and premium ApiKeys. Users can join group bots and get support from Sociabuzz. The tool can be run on Heroku with specific buildpacks and is suitable for Windows/VPS/RDP users who need Git, NodeJS, FFmpeg, and ImageMagick installations.

LabelQuick
LabelQuick_V2.0 is a fast image annotation tool designed and developed by the AI Horizon team. This version has been optimized and improved based on the previous version. It provides an intuitive interface and powerful annotation and segmentation functions to efficiently complete dataset annotation work. The tool supports video object tracking annotation, quick annotation by clicking, and various video operations. It introduces the SAM2 model for accurate and efficient object detection in video frames, reducing manual intervention and improving annotation quality. The tool is designed for Windows systems and requires a minimum of 6GB of memory.

rime_wanxiang
Rime Wanxiang is a pinyin input method based on deep optimized lexicon and language model. It features a lexicon with tones, AI and large corpus filtering, and frequency addition to provide more accurate sentence output. The tool supports various input methods and customization options, aiming to enhance user experience through lexicon and transcription. Users can also refresh the lexicon with different types of auxiliary codes using the LMDG toolkit package. Wanxiang offers core features like tone-marked pinyin annotations, phrase composition, and word frequency, with customizable functionalities. The tool is designed to provide a seamless input experience based on lexicon and transcription.

MoneyPrinterTurbo
MoneyPrinterTurbo is a tool that can automatically generate video content based on a provided theme or keyword. It can create video scripts, materials, subtitles, and background music, and then compile them into a high-definition short video. The tool features a web interface and an API interface, supporting AI-generated video scripts, customizable scripts, multiple HD video sizes, batch video generation, customizable video segment duration, multilingual video scripts, multiple voice synthesis options, subtitle generation with font customization, background music selection, access to high-definition and copyright-free video materials, and integration with various AI models like OpenAI, moonshot, Azure, and more. The tool aims to simplify the video creation process and offers future plans to enhance voice synthesis, add video transition effects, provide more video material sources, offer video length options, include free network proxies, enable real-time voice and music previews, support additional voice synthesis services, and facilitate automatic uploads to YouTube platform.

AirPower4T
AirPower4T is a development base library based on Vue3 TypeScript Element Plus Vite, using decorators, object-oriented, Hook and other front-end development methods. It provides many common components and some feedback components commonly used in background management systems, and provides a lot of enums and decorators.

MouseTooltipTranslator
MouseTooltipTranslator is a Chrome extension that allows users to translate any text on a webpage by simply hovering over it. It supports both Google Translate and Bing Translate, and can also be used to listen to the pronunciation of words and phrases. Additionally, the extension can be used to translate text in input boxes and highlighted text, and to display translated tooltips for PDFs and YouTube videos. It also supports OCR, allowing users to translate text in images by holding down the left shift key and hovering over the image.

DeepBattler
DeepBattler is a tool designed for Hearthstone Battlegrounds players, providing real-time strategic advice and insights to improve gameplay experience. It integrates with the Hearthstone Deck Tracker plugin and offers voice-assisted guidance. The tool is powered by a large language model (LLM) and can match the strength of top players on EU servers. Users can set up the tool by adding dependencies, configuring the plugin path, and launching the LLM agent. DeepBattler is licensed for personal, educational, and non-commercial use, with guidelines on non-commercial distribution and acknowledgment of external contributions.

Fay
Fay is an open-source digital human framework that offers different versions for various purposes. The '带货完整版' is suitable for online and offline salespersons. The '助理完整版' serves as a human-machine interactive digital assistant that can also control devices upon command. The 'agent版' is designed to be an autonomous agent capable of making decisions and contacting its owner. The framework provides updates and improvements across its different versions, including features like emotion analysis integration, model optimizations, and compatibility enhancements. Users can access detailed documentation for each version through the provided links.

Daily-DeepLearning
Daily-DeepLearning is a repository that covers various computer science topics such as data structures, operating systems, computer networks, Python programming, data science packages like numpy, pandas, matplotlib, machine learning theories, deep learning theories, NLP concepts, machine learning practical applications, deep learning practical applications, and big data technologies like Hadoop and Hive. It also includes coding exercises related to '剑指offer'. The repository provides detailed explanations and examples for each topic, making it a comprehensive resource for learning and practicing different aspects of computer science and data-related fields.

xhs_ai_publisher
xhs_ai_publisher is an automation tool designed for publishing articles on the Xiaohongshu platform. It combines a graphical user interface with automation scripts to generate content using large model technology. The tool simplifies the content creation and publishing process by automatically logging in and publishing articles through a web browser.

chatwiki
ChatWiki is an open-source knowledge base AI question-answering system. It is built on large language models (LLM) and retrieval-augmented generation (RAG) technologies, providing out-of-the-box data processing, model invocation capabilities, and helping enterprises quickly build their own knowledge base AI question-answering systems. It offers exclusive AI question-answering system, easy integration of models, data preprocessing, simple user interface design, and adaptability to different business scenarios.

MarkMap-OpenAi-ChatGpt
MarkMap-OpenAi-ChatGpt is a Vue.js-based mind map generation tool that allows users to generate mind maps by entering titles or content. The application integrates the markmap-lib and markmap-view libraries, supports visualizing mind maps, and provides functions for zooming and adapting the map to the screen. Users can also export the generated mind map in PNG, SVG, JPEG, and other formats. This project is suitable for quickly organizing ideas, study notes, project planning, etc. By simply entering content, users can get an intuitive mind map that can be continuously expanded, downloaded, and shared.

DocTranslator
DocTranslator is a document translation tool that supports various file formats, compatible with OpenAI format API, and offers batch operations and multi-threading support. Whether for individual users or enterprise teams, DocTranslator helps efficiently complete document translation tasks. It supports formats like txt, markdown, word, csv, excel, pdf (non-scanned), and ppt for AI translation. The tool is deployed using Docker for easy setup and usage.

uDesktopMascot
uDesktopMascot is an open-source project for a desktop mascot application with a theme of 'freedom of creation'. It allows users to load and display VRM or GLB/FBX model files on the desktop, customize GUI colors and background images, and access various features through a menu screen. The application supports Windows 10/11 and macOS platforms.
For similar tasks

InterPilot
InterPilot is an AI-based assistant tool that captures audio from Windows input/output devices, transcribes it into text, and then calls the Large Language Model (LLM) API to provide answers. The project includes recording, transcription, and AI response modules, aiming to provide support for personal legitimate learning, work, and research. It may assist in scenarios like interviews, meetings, and learning, but it is strictly for learning and communication purposes only. The tool can hide its interface using third-party tools to prevent screen recording or screen sharing, but it does not have this feature built-in. Users bear the risk of using third-party tools independently.

onyx
Onyx is an open-source Gen-AI and Enterprise Search tool that serves as an AI Assistant connected to company documents, apps, and people. It provides a chat interface, can be deployed anywhere, and offers features like user authentication, role management, chat persistence, and UI for configuring AI Assistants. Onyx acts as an Enterprise Search tool across various workplace platforms, enabling users to access team-specific knowledge and perform tasks like document search, AI answers for natural language queries, and integration with common workplace tools like Slack, Google Drive, Confluence, etc.

Friend
Friend is an open-source AI wearable device that records everything you say, gives you proactive feedback and advice. It has real-time AI audio processing capabilities, low-powered Bluetooth, open-source software, and a wearable design. The device is designed to be affordable and easy to use, with a total cost of less than $20. To get started, you can clone the repo, choose the version of the app you want to install, and follow the instructions for installing the firmware and assembling the device. Friend is still a prototype project and is provided "as is", without warranty of any kind. Use of the device should comply with all local laws and regulations concerning privacy and data protection.

obsidian-systemsculpt-ai
SystemSculpt AI is a comprehensive AI-powered plugin for Obsidian, integrating advanced AI capabilities into note-taking, task management, knowledge organization, and content creation. It offers modules for brain integration, chat conversations, audio recording and transcription, note templates, and task generation and management. Users can customize settings, utilize AI services like OpenAI and Groq, and access documentation for detailed guidance. The plugin prioritizes data privacy by storing sensitive information locally and offering the option to use local AI models for enhanced privacy.

local-talking-llm
The 'local-talking-llm' repository provides a tutorial on building a voice assistant similar to Jarvis or Friday from Iron Man movies, capable of offline operation on a computer. The tutorial covers setting up a Python environment, installing necessary libraries like rich, openai-whisper, suno-bark, langchain, sounddevice, pyaudio, and speechrecognition. It utilizes Ollama for Large Language Model (LLM) serving and includes components for speech recognition, conversational chain, and speech synthesis. The implementation involves creating a TextToSpeechService class for Bark, defining functions for audio recording, transcription, LLM response generation, and audio playback. The main application loop guides users through interactive voice-based conversations with the assistant.

Scriberr
Scriberr is a self-hostable AI audio transcription app that utilizes open-source Whisper models from OpenAI for transcribing audio files locally on user's hardware. It offers fast transcription with customizable compute settings, local transcription on device, API endpoints for automation, and integration with other tools. Users can optionally summarize transcripts using ChatGPT or Ollama, with support for custom prompts. The app is mobile-ready, simple, and easy to use, with planned features including speaker diarization, audio recording, file actions, full text fuzzy search, tag-based organization, follow-along text with playback, edit summaries, export options, and support for other languages. Despite being in beta, Scriberr is functional and usable, albeit with some rough edges and minor bugs.

agents
The LiveKit Agent Framework is designed for building real-time, programmable participants that run on servers. Easily tap into LiveKit WebRTC sessions and process or generate audio, video, and data streams. The framework includes plugins for common workflows, such as voice activity detection and speech-to-text. Agents integrates seamlessly with LiveKit server, offloading job queuing and scheduling responsibilities to it. This eliminates the need for additional queuing infrastructure. Agent code developed on your local machine can scale to support thousands of concurrent sessions when deployed to a server in production.

openvino-plugins-ai-audacity
OpenVINO™ AI Plugins for Audacity* are a set of AI-enabled effects, generators, and analyzers for Audacity®. These AI features run 100% locally on your PC -- no internet connection necessary! OpenVINO™ is used to run AI models on supported accelerators found on the user's system such as CPU, GPU, and NPU. * **Music Separation**: Separate a mono or stereo track into individual stems -- Drums, Bass, Vocals, & Other Instruments. * **Noise Suppression**: Removes background noise from an audio sample. * **Music Generation & Continuation**: Uses MusicGen LLM to generate snippets of music, or to generate a continuation of an existing snippet of music. * **Whisper Transcription**: Uses whisper.cpp to generate a label track containing the transcription or translation for a given selection of spoken audio or vocals.
For similar jobs

InterPilot
InterPilot is an AI-based assistant tool that captures audio from Windows input/output devices, transcribes it into text, and then calls the Large Language Model (LLM) API to provide answers. The project includes recording, transcription, and AI response modules, aiming to provide support for personal legitimate learning, work, and research. It may assist in scenarios like interviews, meetings, and learning, but it is strictly for learning and communication purposes only. The tool can hide its interface using third-party tools to prevent screen recording or screen sharing, but it does not have this feature built-in. Users bear the risk of using third-party tools independently.

lollms-webui
LoLLMs WebUI (Lord of Large Language Multimodal Systems: One tool to rule them all) is a user-friendly interface to access and utilize various LLM (Large Language Models) and other AI models for a wide range of tasks. With over 500 AI expert conditionings across diverse domains and more than 2500 fine tuned models over multiple domains, LoLLMs WebUI provides an immediate resource for any problem, from car repair to coding assistance, legal matters, medical diagnosis, entertainment, and more. The easy-to-use UI with light and dark mode options, integration with GitHub repository, support for different personalities, and features like thumb up/down rating, copy, edit, and remove messages, local database storage, search, export, and delete multiple discussions, make LoLLMs WebUI a powerful and versatile tool.

Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

minio
MinIO is a High Performance Object Storage released under GNU Affero General Public License v3.0. It is API compatible with Amazon S3 cloud storage service. Use MinIO to build high performance infrastructure for machine learning, analytics and application data workloads.

mage-ai
Mage is an open-source data pipeline tool for transforming and integrating data. It offers an easy developer experience, engineering best practices built-in, and data as a first-class citizen. Mage makes it easy to build, preview, and launch data pipelines, and provides observability and scaling capabilities. It supports data integrations, streaming pipelines, and dbt integration.

AiTreasureBox
AiTreasureBox is a versatile AI tool that provides a collection of pre-trained models and algorithms for various machine learning tasks. It simplifies the process of implementing AI solutions by offering ready-to-use components that can be easily integrated into projects. With AiTreasureBox, users can quickly prototype and deploy AI applications without the need for extensive knowledge in machine learning or deep learning. The tool covers a wide range of tasks such as image classification, text generation, sentiment analysis, object detection, and more. It is designed to be user-friendly and accessible to both beginners and experienced developers, making AI development more efficient and accessible to a wider audience.

tidb
TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

airbyte
Airbyte is an open-source data integration platform that makes it easy to move data from any source to any destination. With Airbyte, you can build and manage data pipelines without writing any code. Airbyte provides a library of pre-built connectors that make it easy to connect to popular data sources and destinations. You can also create your own connectors using Airbyte's no-code Connector Builder or low-code CDK. Airbyte is used by data engineers and analysts at companies of all sizes to build and manage their data pipelines.