paper_notes
Daily notes on AI papers
Stars: 80
This repository is a manually curated collection of AI-related papers and related resources, such as blog posts, repositories, slides, and SNS posts. It serves as a personal study log and memory aid for the maintainer, with paper notes written in Japanese. The repository focuses on research areas like Natural Language Processing, Computer Vision, Speech Processing, and Recommender Systems. It covers topics such as Large Language Models, Proprietary/Open-Source Models, AI Agents, Datasets, Reinforcement Learning, and Transformer architectures. The issues contain links, paper metadata, Japanese translations, notes, labels, and external references, with varying levels of detail based on the maintainer's time and interest.
README:
[English | 日本語 ]
This repository is a manually curated collection of AI-related papers and related resources, such as blog posts, repositories, slides, and SNS posts.
All items are managed as GitHub Issues.
The primary purpose of this repository is to serve as a personal study log and memory aid for the maintainer.
Since the maintainer is Japanese, paper notes are manually written in Japanese, reflecting the maintainer’s own understanding and learning process.
Each issue mainly contains the following information:
- 🔗 Links to papers, blog posts, or related resources
- 📄 Paper metadata and abstracts
- 🌐 Japanese translations and summaries of abstracts
- 📝 Notes based on the maintainer’s understanding
- 🏷️ Manually assigned labels reflecting the maintainer’s interpretation
- 📚 Useful external references
Paper metadata and abstracts are retrieved via the arXiv API1. Since the maintainer is a native Japanese speaker, Japanese translations and summaries of abstracts are automatically generated using GPT.
The repository currently focuses on the following four areas:
- Natural Language Processing (NLP)
- Computer Vision (CV)
- Speech Processing
- Recommender Systems (RecSys)
Recent areas of focus include (but are not limited to):
- Large Language Models (LLMs) / Vision-Language Models (VLMs) / Diffusion Models
- Proprietary / Open-Weight / Open-Source Models
- Multimodal Models / Unified Multimodal Models (UMMs)
- AI Agents / Context Engineering / Memory / Evaluation Harnesses / Scaffoldings
- (Synthetic) Datasets / Evaluation / Benchmarks
- Pre-training / Mid-training / Post-training / Continual Learning
- Reinforcement Learning
- Test-time Scaling / Test-time Learning
- Representation Learning / Embeddings
- Self-Improving AI
- Transformer architectures (Attention, Positional Encoding, Residual Streams, etc.)
Each Issue may contain:
- The maintainer’s understanding or interpretation of the work 2
- Initial impressions
- Links to explanatory SNS posts or discussions
The level of detail varies depending on available time and interest at the time of reading.
Notes are managed using the following labels (not all Issues have been fully labeled yet because the maintainer have manually labeled it):
-
: first impressions based on the abstract, related posts, or skimming
-
: a single key takeaway
-
: a broad summary of the paper
-
: notes on multiple important technical points
-
: details of the paper and multiple important technical points with the maintainer's own opinion
-
: a collection of related links (e.g., explanatory posts)
Labels are broadly categorized and color-coded as follows:
-
Method:
,
, ``
, ...
The above labels are just a few examples, and there are currently 600+ different labels in total. You can filter issues using these labels.
In addition to Issue-based notes, this repository provides two complementary ways to explore and consume the collected information.
Wiki: Personal Trend Notes
A Wiki is maintained to record personal impressions of trends observed while casually browsing papers and discussions on X.
- These notes reflect subjective observations and interpretations
- They may contain inaccuracies or biases
- Updates are irregular and informal
The Wiki is intended as a lightweight place to capture emerging ideas, trends, and intuitions that may not yet be fully formed.
DeepWiki: Conversational Interface
This repository can also be explored via DeepWiki, which provides a chat-based interface for interacting with the contents of this repository.
Using DeepWiki, it may be easier to:
- Interactively extract information from Issues and notes
- Ask questions about specific papers, topics, or trends
- Navigate the repository in a conversational manner rather than browsing Issues manually
2026.01.20
- Update README.md
2025.10.17
- I created a Wiki to jot down personal impressions of trends observed while casually browsing papers on X.
- These are purely subjective notes and may contain inaccuracies. Updates will be irregular and informal.
2025.08.05
- At present, the most convenient way to use this repository may be to extract knowledge interactively via its DeepWiki interface.
2025.07.19
- The following improvements were made:
- Fixed degraded blog response time
- Updated displayed blog content
- (A full redesign is still pending 🫠)
2024.12.28
- Several months have passed since I first planned to redesign the blog.
2024.09
- I reorganized Issue notes into a blog-style format.
- The blog is available here.
2017.12.28
- I started this repository and created the first issue.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for paper_notes
Similar Open Source Tools
paper_notes
This repository is a manually curated collection of AI-related papers and related resources, such as blog posts, repositories, slides, and SNS posts. It serves as a personal study log and memory aid for the maintainer, with paper notes written in Japanese. The repository focuses on research areas like Natural Language Processing, Computer Vision, Speech Processing, and Recommender Systems. It covers topics such as Large Language Models, Proprietary/Open-Source Models, AI Agents, Datasets, Reinforcement Learning, and Transformer architectures. The issues contain links, paper metadata, Japanese translations, notes, labels, and external references, with varying levels of detail based on the maintainer's time and interest.
aiida-core
AiiDA (www.aiida.net) is a workflow manager for computational science with a strong focus on provenance, performance and extensibility. **Features** * **Workflows:** Write complex, auto-documenting workflows in python, linked to arbitrary executables on local and remote computers. The event-based workflow engine supports tens of thousands of processes per hour with full checkpointing. * **Data provenance:** Automatically track inputs, outputs & metadata of all calculations in a provenance graph for full reproducibility. Perform fast queries on graphs containing millions of nodes. * **HPC interface:** Move your calculations to a different computer by changing one line of code. AiiDA is compatible with schedulers like SLURM, PBS Pro, torque, SGE or LSF out of the box. * **Plugin interface:** Extend AiiDA with plugins for new simulation codes (input generation & parsing), data types, schedulers, transport modes and more. * **Open Science:** Export subsets of your provenance graph and share them with peers or make them available online for everyone on the Materials Cloud. * **Open source:** AiiDA is released under the MIT open source license
deepchecks
Deepchecks is a holistic open-source solution for AI & ML validation needs, enabling thorough testing of data and models from research to production. It includes components for testing, CI & testing management, and monitoring. Users can install and use Deepchecks for testing and monitoring their AI models, with customizable checks and suites for tabular, NLP, and computer vision data. The tool provides visual reports, pythonic/json output for processing, and a dynamic UI for collaboration and monitoring. Deepchecks is open source, with premium features available under a commercial license for monitoring components.
LMCache
LMCache is a serving engine extension designed to reduce time to first token (TTFT) and increase throughput, particularly in long-context scenarios. It stores key-value caches of reusable texts across different locations like GPU, CPU DRAM, and Local Disk, allowing the reuse of any text in any serving engine instance. By combining LMCache with vLLM, significant delay savings and GPU cycle reduction are achieved in various large language model (LLM) use cases, such as multi-round question answering and retrieval-augmented generation (RAG). LMCache provides integration with the latest vLLM version, offering both online serving and offline inference capabilities. It supports sharing key-value caches across multiple vLLM instances and aims to provide stable support for non-prefix key-value caches along with user and developer documentation.
MONAI
MONAI is a PyTorch-based, open-source framework for deep learning in healthcare imaging. It provides a comprehensive set of tools for medical image analysis, including data preprocessing, model training, and evaluation. MONAI is designed to be flexible and easy to use, making it a valuable resource for researchers and developers in the field of medical imaging.
MinerU
MinerU is a tool that converts PDFs into machine-readable formats, allowing for easy extraction into any format. It focuses on solving symbol conversion issues in scientific literature and contributes to technological development. It removes headers, footers, footnotes, and page numbers, preserves document structure, extracts images, tables, and formulas, and supports OCR in 109 languages. MinerU supports various visualization results, runs on CPU/GPU/NPU, and is compatible with Windows, Linux, and Mac platforms.
biochatter
Generative AI models have shown tremendous usefulness in increasing accessibility and automation of a wide range of tasks. This repository contains the `biochatter` Python package, a generic backend library for the connection of biomedical applications to conversational AI. It aims to provide a common framework for deploying, testing, and evaluating diverse models and auxiliary technologies in the biomedical domain. BioChatter is part of the BioCypher ecosystem, connecting natively to BioCypher knowledge graphs.
rocketnotes
Rocketnotes is a web-based Markdown note taking app with LLM-powered text completion, chat and semantic search. It utilizes a 100% serverless RAG pipeline build with langchain, sentence-transformers, faiss and OpenAI or Anthropic API.
Kori
Kori is a unified note-taking app with AI capabilities, providing a consistent experience across Android, iOS, Windows, macOS, and Linux. It supports various formats like Drawing, Markdown, TXT, LaTeX, Mermaid diagrams, and Todo.txt lists. Users can benefit from AI co-writing features, note outline generation, find and replace, note templates, local media support, and export options. The app follows Material Design 3 guidelines, offers comprehensive mouse and keyboard support, and is optimized for different screen sizes and orientations.
DriveLM
DriveLM is a multimodal AI model that enables autonomous driving by combining computer vision and natural language processing. It is designed to understand and respond to complex driving scenarios using visual and textual information. DriveLM can perform various tasks related to driving, such as object detection, lane keeping, and decision-making. It is trained on a massive dataset of images and text, which allows it to learn the relationships between visual cues and driving actions. DriveLM is a powerful tool that can help to improve the safety and efficiency of autonomous vehicles.
Kiln
Kiln is an intuitive tool for fine-tuning LLM models, generating synthetic data, and collaborating on datasets. It offers desktop apps for Windows, MacOS, and Linux, zero-code fine-tuning for various models, interactive data generation, and Git-based version control. Users can easily collaborate with QA, PM, and subject matter experts, generate auto-prompts, and work with a wide range of models and providers. The tool is open-source, privacy-first, and supports structured data tasks in JSON format. Kiln is free to use and helps build high-quality AI products with datasets, facilitates collaboration between technical and non-technical teams, allows comparison of models and techniques without code, ensures structured data integrity, and prioritizes user privacy.
tidb
TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.
verl
verl is a flexible and efficient RL training library for large language models (LLMs). It offers easy extension of diverse RL algorithms, seamless integration with existing LLM infra, flexible device mapping, and integration with popular Hugging Face models. The library provides state-of-the-art throughput, efficient actor model resharding, and supports various RL algorithms like PPO, GRPO, and more. It also supports model-based and function-based rewards for tasks like math and coding, vision-language models, and multi-modal RL. verl is used for tasks like training large language models, reasoning tasks, reinforcement learning with diverse algorithms, and multi-modal RL.
lancedb
LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrieval, filtering, and management of embeddings. The key features of LanceDB include: Production-scale vector search with no servers to manage. Store, query, and filter vectors, metadata, and multi-modal data (text, images, videos, point clouds, and more). Support for vector similarity search, full-text search, and SQL. Native Python and Javascript/Typescript support. Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure. GPU support in building vector index(*). Ecosystem integrations with LangChain 🦜️🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB, and more on the way. LanceDB's core is written in Rust 🦀 and is built using Lance, an open-source columnar format designed for performant ML workloads.
local-deep-research
Local Deep Research is a powerful AI-powered research assistant that performs deep, iterative analysis using multiple LLMs and web searches. It can be run locally for privacy or configured to use cloud-based LLMs for enhanced capabilities. The tool offers advanced research capabilities, flexible LLM support, rich output options, privacy-focused operation, enhanced search integration, and academic & scientific integration. It also provides a web interface, command line interface, and supports multiple LLM providers and search engines. Users can configure AI models, search engines, and research parameters for customized research experiences.
For similar tasks
awesome-mobile-robotics
The 'awesome-mobile-robotics' repository is a curated list of important content related to Mobile Robotics and AI. It includes resources such as courses, books, datasets, software and libraries, podcasts, conferences, journals, companies and jobs, laboratories and research groups, and miscellaneous resources. The repository covers a wide range of topics in the field of Mobile Robotics and AI, providing valuable information for enthusiasts, researchers, and professionals in the domain.
fiftyone-brain
FiftyOne Brain contains the open source AI/ML capabilities for the FiftyOne ecosystem, enabling users to automatically analyze and manipulate their datasets and models. Features include visual similarity search, query by text, finding unique and representative samples, finding media quality problems and annotation mistakes, and more.
mmf
MMF is a modular framework for vision and language multimodal research from Facebook AI Research. It contains reference implementations of state-of-the-art vision and language models, allowing distributed training. MMF serves as a starter codebase for challenges around vision and language datasets, such as The Hateful Memes, TextVQA, TextCaps, and VQA challenges. It is scalable, fast, and un-opinionated, providing a solid foundation for vision and language multimodal research projects.
doris
Doris is a lightweight and user-friendly data visualization tool designed for quick and easy exploration of datasets. It provides a simple interface for users to upload their data and generate interactive visualizations without the need for coding. With Doris, users can easily create charts, graphs, and dashboards to analyze and present their data in a visually appealing way. The tool supports various data formats and offers customization options to tailor visualizations to specific needs. Whether you are a data analyst, researcher, or student, Doris simplifies the process of data exploration and presentation.
EScAIP
EScAIP is an Efficiently Scaled Attention Interatomic Potential that leverages a novel multi-head self-attention formulation within graph neural networks to predict energy and forces between atoms in molecules and materials. It achieves substantial gains in efficiency, at least 10x speed up in inference time and 5x less memory usage compared to existing models. EScAIP represents a philosophy towards developing general-purpose Neural Network Interatomic Potentials that achieve better expressivity through scaling and continue to scale efficiently with increased computational resources and training data.
BioAgents
BioAgents AgentKit is an advanced AI agent framework tailored for biological and scientific research. It offers powerful conversational AI capabilities with specialized knowledge in biology, life sciences, and scientific research methodologies. The framework includes state-of-the-art analysis agents, configurable research agents, and a variety of specialized agents for tasks such as file parsing, research planning, literature search, data analysis, hypothesis generation, research reflection, and user-facing responses. BioAgents also provides support for LLM libraries, multiple search backends for literature agents, and two backends for data analysis. The project structure includes backend source code, services for chat, job queue system, real-time notifications, and JWT authentication, as well as a frontend UI built with Preact.
paper_notes
This repository is a manually curated collection of AI-related papers and related resources, such as blog posts, repositories, slides, and SNS posts. It serves as a personal study log and memory aid for the maintainer, with paper notes written in Japanese. The repository focuses on research areas like Natural Language Processing, Computer Vision, Speech Processing, and Recommender Systems. It covers topics such as Large Language Models, Proprietary/Open-Source Models, AI Agents, Datasets, Reinforcement Learning, and Transformer architectures. The issues contain links, paper metadata, Japanese translations, notes, labels, and external references, with varying levels of detail based on the maintainer's time and interest.
labelbox-python
Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.
For similar jobs
Perplexica
Perplexica is an open-source AI-powered search engine that utilizes advanced machine learning algorithms to provide clear answers with sources cited. It offers various modes like Copilot Mode, Normal Mode, and Focus Modes for specific types of questions. Perplexica ensures up-to-date information by using SearxNG metasearch engine. It also features image and video search capabilities and upcoming features include finalizing Copilot Mode and adding Discover and History Saving features.
KULLM
KULLM (구름) is a Korean Large Language Model developed by Korea University NLP & AI Lab and HIAI Research Institute. It is based on the upstage/SOLAR-10.7B-v1.0 model and has been fine-tuned for instruction. The model has been trained on 8×A100 GPUs and is capable of generating responses in Korean language. KULLM exhibits hallucination and repetition phenomena due to its decoding strategy. Users should be cautious as the model may produce inaccurate or harmful results. Performance may vary in benchmarks without a fixed system prompt.
MMMU
MMMU is a benchmark designed to evaluate multimodal models on college-level subject knowledge tasks, covering 30 subjects and 183 subfields with 11.5K questions. It focuses on advanced perception and reasoning with domain-specific knowledge, challenging models to perform tasks akin to those faced by experts. The evaluation of various models highlights substantial challenges, with room for improvement to stimulate the community towards expert artificial general intelligence (AGI).
1filellm
1filellm is a command-line data aggregation tool designed for LLM ingestion. It aggregates and preprocesses data from various sources into a single text file, facilitating the creation of information-dense prompts for large language models. The tool supports automatic source type detection, handling of multiple file formats, web crawling functionality, integration with Sci-Hub for research paper downloads, text preprocessing, and token count reporting. Users can input local files, directories, GitHub repositories, pull requests, issues, ArXiv papers, YouTube transcripts, web pages, Sci-Hub papers via DOI or PMID. The tool provides uncompressed and compressed text outputs, with the uncompressed text automatically copied to the clipboard for easy pasting into LLMs.
gpt-researcher
GPT Researcher is an autonomous agent designed for comprehensive online research on a variety of tasks. It can produce detailed, factual, and unbiased research reports with customization options. The tool addresses issues of speed, determinism, and reliability by leveraging parallelized agent work. The main idea involves running 'planner' and 'execution' agents to generate research questions, seek related information, and create research reports. GPT Researcher optimizes costs and completes tasks in around 3 minutes. Features include generating long research reports, aggregating web sources, an easy-to-use web interface, scraping web sources, and exporting reports to various formats.
ChatTTS
ChatTTS is a generative speech model optimized for dialogue scenarios, providing natural and expressive speech synthesis with fine-grained control over prosodic features. It supports multiple speakers and surpasses most open-source TTS models in terms of prosody. The model is trained with 100,000+ hours of Chinese and English audio data, and the open-source version on HuggingFace is a 40,000-hour pre-trained model without SFT. The roadmap includes open-sourcing additional features like VQ encoder, multi-emotion control, and streaming audio generation. The tool is intended for academic and research use only, with precautions taken to limit potential misuse.
HebTTS
HebTTS is a language modeling approach to diacritic-free Hebrew text-to-speech (TTS) system. It addresses the challenge of accurately mapping text to speech in Hebrew by proposing a language model that operates on discrete speech representations and is conditioned on a word-piece tokenizer. The system is optimized using weakly supervised recordings and outperforms diacritic-based Hebrew TTS systems in terms of content preservation and naturalness of generated speech.
do-research-in-AI
This repository is a collection of research lectures and experience sharing posts from frontline researchers in the field of AI. It aims to help individuals upgrade their research skills and knowledge through insightful talks and experiences shared by experts. The content covers various topics such as evaluating research papers, choosing research directions, research methodologies, and tips for writing high-quality scientific papers. The repository also includes discussions on academic career paths, research ethics, and the emotional aspects of research work. Overall, it serves as a valuable resource for individuals interested in advancing their research capabilities in the field of AI.