Verbiverse

利用 LLM 大模型辅助阅读 PDF 与观看视频，用以提升语言能力。

Stars: 96

Visit

Verbiverse is a tool that uses a large language model to assist in reading PDFs and watching videos, aimed at improving language proficiency. It provides a more convenient and efficient way to use large models through predefined prompts, designed for those looking to enhance their language skills. The tool analyzes unfamiliar words and sentences in foreign language PDFs or video subtitles, providing better contextual understanding compared to traditional dictionary translations or ambiguous meanings. It offers features such as automatic loading of subtitles, word analysis by clicking or double-clicking, and a word database for collecting words. Users can run the tool on Windows x86_64 or ubuntu_22.04 x86_64 platforms by downloading the precompiled packages or by cloning the source code and setting up a virtual environment with Python. It is recommended to use a local model or smaller PDF files for testing due to potential token consumption issues with large files.

README:

Verbiverse

Verbiverse 利用 LLM 大模型辅助阅读 PDF 与观看视频，用以提升语言能力。

利用大模型辅助阅读

通过预定义的 Prompt 更加方便、快捷的使用大模型，专为想要提升语言能力的你所设计。
探索本项目的文档 »

上手指南 · 报告Bug/提出特性 · 项目源码解析 · discord 群组

更新

V1.1.1 增加了使用 Bing 进行翻译的结果，避免大模型解释名词不准确的问题：
V1.1 增加 TTS，可以划「词/句」朗读：
- 去除了目标语言解释，增加了 TTS 按钮；
- 修复了一些小问题，增加了一些提示；
- 优化了一些界面；

功能介绍

Verbiverse 可以针对阅读外语 PDF 或观看视频时的外语字幕，对不理解的单词、语句结合上下文进行解析，对比词典的生硬翻译、多义词模糊不清的情况有更好的体验：

主页界面如下，左侧为导航栏跳转至不同功能页面，主页面整体分为 PDF或视频文件入口与历史文件列表两个部分，点击对应的按钮选择对应文件即可：
主页点击阅读 PDF 打开想要阅读的文档后，工具会自动跳转如下界面，左侧为阅读区、右侧为 LLM 对话区，当选中陌生词汇后鼠标右键可以通过 LLM 进行解析：
主页点击观看视频选择视频文件后，工具会自动跳转视频播放界面，左侧分为视频播放区、字幕，右侧则是是字幕列表与同文件夹下的其他媒体文件：
- 工具会自动加载同目录下同名但以 srt 结尾的字幕文件，如果字幕文件无法自动加载可以鼠标右键手动选择添加；
- 双击字幕中的单词会对单个词进行解析，点击右侧的图标则是对当前字幕语句进行解析：
单词本当前没有什么特殊功能，仅仅用于收集作用，有什么好的建议欢迎提出：

上手指南

预编译包

提供如下平台预编译程序包，下载对应平台程序包执行即可：

源码运行

clone 源码到本地：git clone https://github.com/HATTER-LONG/Verbiverse.git
使用conda 或 python (>=3.9, <=3.12) venv 创建虚拟环境，推荐使用 conda：
- 使用 conda：conda create -n Verbiverse python=3.11
  - 激活虚拟环境：conda activate Verbiverse;
- 使用 venv，进入源码目录后：python3 -m venv ./.venv;source ./venv/bin/activate;
安装 poetry：
- 确认已正确启用虚拟环境；
- pip install -U pip setuptools;pip install poetry;
安装项目依赖环境：poetry install：
- 需要代理则取消 pyproject.toml 中 [[tool.poetry.source]] 相关注释，然后重新 poetry lock --no-update;
运行程序：python3 main.py

工具设置

⚠️强烈建议优先使用本地模型或较小的 PDF 文档进行试用，因为工具很多的 prompt 与向量嵌入并没有对 token 进行优化，过大的文件可能会造成大量 token 消耗！！！！

工具的核心功能依赖 LLM，因此在使用前需要配置相应 LLM 的服务信息，工具支持使用本地模型或云端商用模型，当前支持 OpenAI 协议本地工具或商用模型与通义千问商用模型：

使用商用模型，下图以通义千问为例，如需使用 OpenAI 同理填入对应信息即可：
使用本地模型，需要选择 OpenAI 协议，填入本地工具的 LLM 服务地址，使用 LM Studio 为例：
- LM Studio 安装对应模型：
- 工具配置：

鸣谢

感谢如下相关开源项目：

For Tasks:

Click tags to check more tools for each tasks

analyze words translate sentences read pdfs watch videos collect words

For Jobs:

language tutor translator language researcher language student educational content creator

Alternative AI tools for Verbiverse

Similar Open Source Tools

Verbiverse

github

: 96

Imagine_AI

IMAGINE - AI is a groundbreaking image generator tool that leverages the power of OpenAI's DALL-E 2 API library to create extraordinary visuals. Developed using Node.js and Express, this tool offers a transformative way to unleash artistic creativity and imagination by generating unique and captivating images through simple prompts or keywords.

github

: 51

ollama4j

Ollama4j is a Java library that serves as a wrapper or binding for the Ollama server. It facilitates communication with the Ollama server and provides models for deployment. The tool requires Java 11 or higher and can be installed locally or via Docker. Users can integrate Ollama4j into Maven projects by adding the specified dependency. The tool offers API specifications and supports various development tasks such as building, running unit tests, and integration tests. Releases are automated through GitHub Actions CI workflow. Areas of improvement include adhering to Java naming conventions, updating deprecated code, implementing logging, using lombok, and enhancing request body creation. Contributions to the project are encouraged, whether reporting bugs, suggesting enhancements, or contributing code.

github

: 162

cb-tumblebug

CB-Tumblebug (CB-TB) is a system for managing multi-cloud infrastructure consisting of resources from multiple cloud service providers. It provides an overview, features, and architecture. The tool supports various cloud providers and resource types, with ongoing development and localization efforts. Users can deploy a multi-cloud infra with GPUs, enjoy multiple LLMs in parallel, and utilize LLM-related scripts. The tool requires Linux, Docker, Docker Compose, and Golang for building the source. Users can run CB-TB with Docker Compose or from the Makefile, set up prerequisites, contribute to the project, and view a list of contributors. The tool is licensed under an open-source license.

github

: 52

llama-assistant

Llama Assistant is an AI-powered assistant that helps with daily tasks, such as voice recognition, natural language processing, summarizing text, rephrasing sentences, answering questions, and more. It runs offline on your local machine, ensuring privacy by not sending data to external servers. The project is a work in progress with regular feature additions.

github

: 300

omnihuman

github

: 92

duolingo-clone

Lingo is an interactive platform for language learning that provides a modern UI/UX experience. It offers features like courses, quests, and a shop for users to engage with. The tech stack includes React JS, Next JS, Typescript, Tailwind CSS, Vercel, and Postgresql. Users can contribute to the project by submitting changes via pull requests. The platform utilizes resources from CodeWithAntonio, Kenney Assets, Freesound, Elevenlabs AI, and Flagpack. Key dependencies include @clerk/nextjs, @neondatabase/serverless, @radix-ui/react-avatar, and more. Users can follow the project creator on GitHub and Twitter, as well as subscribe to their YouTube channel for updates. To learn more about Next.js, users can refer to the Next.js documentation and interactive tutorial.

github

: 104

MAVIS

MAVIS (Math Visual Intelligent System) is an AI-driven application that allows users to analyze visual data such as images and generate interactive answers based on them. It can perform complex mathematical calculations, solve programming tasks, and create professional graphics. MAVIS supports Python for coding and frameworks like Matplotlib, Plotly, Seaborn, Altair, NumPy, Math, SymPy, and Pandas. It is designed to make projects more efficient and professional.

github

: 85

llama-assistant

Llama Assistant is a local AI assistant that respects your privacy. It is an AI-powered assistant that can recognize your voice, process natural language, and perform various actions based on your commands. It can help with tasks like summarizing text, rephrasing sentences, answering questions, writing emails, and more. The assistant runs offline on your local machine, ensuring privacy by not sending data to external servers. It supports voice recognition, natural language processing, and customizable UI with adjustable transparency. The project is a work in progress with new features being added regularly.

github

: 485

aiotieba

Aiotieba is an asynchronous Python library for interacting with the Tieba API. It provides a comprehensive set of features for working with Tieba, including support for authentication, thread and post management, and image and file uploading. Aiotieba is well-documented and easy to use, making it a great choice for developers who want to build applications that interact with Tieba.

github

: 340

evalscope

Eval-Scope is a framework designed to support the evaluation of large language models (LLMs) by providing pre-configured benchmark datasets, common evaluation metrics, model integration, automatic evaluation for objective questions, complex task evaluation using expert models, reports generation, visualization tools, and model inference performance evaluation. It is lightweight, easy to customize, supports new dataset integration, model hosting on ModelScope, deployment of locally hosted models, and rich evaluation metrics. Eval-Scope also supports various evaluation modes like single mode, pairwise-baseline mode, and pairwise (all) mode, making it suitable for assessing and improving LLMs.

github

: 692

amica

Amica is an application that allows you to easily converse with 3D characters in your browser. You can import VRM files, adjust the voice to fit the character, and generate response text that includes emotional expressions.

github

: 879

fiftyone

FiftyOne is an open-source tool designed for building high-quality datasets and computer vision models. It supercharges machine learning workflows by enabling users to visualize datasets, interpret models faster, and improve efficiency. With FiftyOne, users can explore scenarios, identify failure modes, visualize complex labels, evaluate models, find annotation mistakes, and much more. The tool aims to streamline the process of improving machine learning models by providing a comprehensive set of features for data analysis and model interpretation.

github

: 9.3k

TempCompass

TempCompass is a benchmark designed to evaluate the temporal perception ability of Video LLMs. It encompasses a diverse set of temporal aspects and task formats to comprehensively assess the capability of Video LLMs in understanding videos. The benchmark includes conflicting videos to prevent models from relying on single-frame bias and language priors. Users can clone the repository, install required packages, prepare data, run inference using examples like Video-LLaVA and Gemini, and evaluate the performance of their models across different tasks such as Multi-Choice QA, Yes/No QA, Caption Matching, and Caption Generation.

github

: 71

fastserve-ai

FastServe-AI is a machine learning serving tool focused on GenAI & LLMs with simplicity as the top priority. It allows users to easily serve custom models by implementing the 'handle' method for 'FastServe'. The tool provides a FastAPI server for custom models and can be deployed using Lightning AI Studio. Users can install FastServe-AI via pip and run it to serve their own GPT-like LLM models in minutes.

github

: 56

rig

Rig is a Rust library designed for building scalable, modular, and user-friendly applications powered by large language models (LLMs). It provides full support for LLM completion and embedding workflows, offers simple yet powerful abstractions for LLM providers like OpenAI and Cohere, as well as vector stores such as MongoDB and in-memory storage. With Rig, users can easily integrate LLMs into their applications with minimal boilerplate code.

github

: 3.4k

For similar tasks

Verbiverse

github

: 96

Perplexica

Perplexica is an open-source AI-powered search engine that utilizes advanced machine learning algorithms to provide clear answers with sources cited. It offers various modes like Copilot Mode, Normal Mode, and Focus Modes for specific types of questions. Perplexica ensures up-to-date information by using SearxNG metasearch engine. It also features image and video search capabilities and upcoming features include finalizing Copilot Mode and adding Discover and History Saving features.

github

: 21.0k

WeeklySpatialAI

WeeklySpatialAI is a weekly online meetup for the Spatial AI KR community to share the latest news and resources in the Spatial AI field. It aims to facilitate information exchange among professionals, students, and professors, covering topics such as latest papers, industry updates, new technologies/products, development tips, job postings, and useful tech blogs.

github

: 67

duckduckgo_search

Duckduckgo_search is a Python library that enables AI chat and search functionalities for text, news, images, and videos using the DuckDuckGo.com search engine. It provides various methods for different search types such as text, images, videos, and news. The library also supports search operators, regions, proxy settings, and exception handling. Users can interact with the DuckDuckGo API to retrieve search results based on specific queries and parameters.

github

: 1.3k

For similar jobs

AMchat

AMchat is a large language model that integrates advanced math concepts, exercises, and solutions. The model is based on the InternLM2-Math-7B model and is specifically designed to answer advanced math problems. It provides a comprehensive dataset that combines Math and advanced math exercises and solutions. Users can download the model from ModelScope or OpenXLab, deploy it locally or using Docker, and even retrain it using XTuner for fine-tuning. The tool also supports LMDeploy for quantization, OpenCompass for evaluation, and various other features for model deployment and evaluation. The project contributors have provided detailed documentation and guides for users to utilize the tool effectively.

github

: 153

duolingo-clone

github

: 104

Verbiverse

github

: 96

AnnA_Anki_neuronal_Appendix

AnnA is a Python script designed to create filtered decks in optimal review order for Anki flashcards. It uses Machine Learning / AI to ensure semantically linked cards are reviewed far apart. The script helps users manage their daily reviews by creating special filtered decks that prioritize reviewing cards that are most different from the rest. It also allows users to reduce the number of daily reviews while increasing retention and automatically identifies semantic neighbors for each note.

github

: 59

EngAce

EngAce is a cutting-edge, generative AI-powered application revolutionizing Vietnamese English learning. It offers personalized learning experiences combining AI with comprehensive features. The repository contains source code, documentation, and resources for the app.

github

: 82

TheoremExplainAgent

TheoremExplainAgent is an AI system that generates long-form Manim videos to visually explain theorems, proving its deep understanding while uncovering reasoning flaws that text alone often hides. The codebase for the paper 'TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding' is available in this repository. It provides a tool for creating multimodal explanations for theorem understanding using AI technology.

github

: 966

vocabulary-book-by-deepseek

Vocabulary Book by DeepSeek is a manual for CET-4, postgraduate entrance examination, and TOEFL vocabulary, providing word meanings, roots, example sentences, mnemonic aids, and mnemonic images. The project uses Cline + DeepSeek-R1-16b for over 80% of the code to automatically encode the vocabulary manual. The generated manual includes vocabulary from A to Z for CET-4, CET-6, postgraduate entrance examination, and TOEFL, along with features to generate Anki cards and PDFs. The tool also allows for the creation of mnemonic images for each word and articles.

github

: 355

awesome-ai-llm4education

The 'awesome-ai-llm4education' repository is a curated list of papers related to artificial intelligence (AI) and large language models (LLM) for education. It collects papers from top conferences, journals, and specialized domain-specific conferences, categorizing them based on specific tasks for better organization. The repository covers a wide range of topics including tutoring, personalized learning, assessment, material preparation, specific scenarios like computer science, language, math, and medicine, aided teaching, as well as datasets and benchmarks for educational research.

github

: 56