llama.vim

llama.vim

Vim plugin for LLM-assisted code/text completion

Stars: 119

Visit
 screenshot

llama.vim is a plugin that provides local LLM-assisted text completion for Vim users. It offers features such as auto-suggest on cursor movement, manual suggestion toggling, suggestion acceptance with Tab and Shift+Tab, control over text generation time, context configuration, ring context with chunks from open and edited files, and performance stats display. The plugin requires a llama.cpp server instance to be running and supports FIM-compatible models. It aims to be simple, lightweight, and provide high-quality and performant local FIM completions even on consumer-grade hardware.

README:

llama.vim

Local LLM-assisted text completion.

image


llama vim-swift

Features

  • Auto-suggest on cursor movement in Insert mode
  • Toggle the suggestion manually by pressing Ctrl+F
  • Accept a suggestion with Tab
  • Accept the first line of a suggestion with Shift+Tab
  • Control max text generation time
  • Configure scope of context around the cursor
  • Ring context with chunks from open and edited files and yanked text
  • Supports very large contexts even on low-end hardware via smart context reuse
  • Display performance stats

Installation

Plugin setup

vim-plug

Plug 'ggml-org/llama.vim'

Vundle

cd ~/.vim/bundle
git clone https://github.com/ggml-org/llama.vim

Then add Plugin 'llama.vim' to your .vimrc in the vundle#begin() section.

llama.cpp setup

The plugin requires a llama.cpp server instance to be running at g:llama_config.endpoint

Mac OS

brew install llama.cpp

Any other OS

Either build from source or use the latest binaries: https://github.com/ggerganov/llama.cpp/releases

llama.cpp settings

Here are recommended settings, depending on the amount of VRAM that you have:

  • More than 16GB VRAM:

    llama-server \
        --hf-repo ggml-org/Qwen2.5-Coder-7B-Q8_0-GGUF \
        --hf-file qwen2.5-coder-7b-q8_0.gguf \
        --port 8012 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 \
        --ctx-size 0 --cache-reuse 256
  • Less than 16GB VRAM:

    llama-server \
        --hf-repo ggml-org/Qwen2.5-Coder-1.5B-Q8_0-GGUF \
        --hf-file qwen2.5-coder-1.5b-q8_0.gguf \
        --port 8012 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 \
        --ctx-size 0 --cache-reuse 256

Use :help llama for more details.

Recommended LLMs

The plugin requires FIM-compatible models: HF collection

Examples

image

Using llama.vim on M1 Pro (2021) with Qwen2.5-Coder 1.5B Q8_0:

image

The orange text is the generated suggestion. The green text contains performance stats for the FIM request: the currently used context is 15186 tokens and the maximum is 32768. There are 30 chunks in the ring buffer with extra context (out of 64). So far, 1 chunk has been evicted in the current session and there are 0 chunks in queue. The newly computed prompt tokens for this request were 260 and the generated tokens were 25. It took 1245 ms to generate this suggestion after entering the letter c on the current line.

Using llama.vim on M2 Ultra with Qwen2.5-Coder 7B Q8_0:

https://github.com/user-attachments/assets/1f1eb408-8ac2-4bd2-b2cf-6ab7d6816754

Demonstrates that the global context is accumulated and maintained across different files and showcases the overall latency when working in a large codebase.

Implementation details

The plugin aims to be very simple and lightweight and at the same time to provide high-quality and performant local FIM completions, even on consumer-grade hardware. Read more on how this is achieved in the following links:

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for llama.vim

Similar Open Source Tools

For similar tasks

For similar jobs