VisionLLM

VisionLLM

VisionLLM Series

Stars: 966

Visit
 screenshot

VisionLLM is a series of large language models designed for vision-centric tasks. The latest version, VisionLLM v2, is a generalist multimodal model that supports hundreds of vision-language tasks, including visual understanding, perception, and generation.

README:

VisionLLM Series

  • VisionLLM: Large Language Model as Open-Ended Decoder for Vision-Centric Tasks (NIPS2023)
  • VisionLLM v2: A Generalist Multimodal Large Language Model for Hundeds of Vision-Language Tasks (NIPS2024)

🚀 News

  • 2024/06: We release VisionLLM v2, which is a generalist multimodal large language model to support hundres of vision-language tasks, covering visual understanding, perception and generation.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for VisionLLM

Similar Open Source Tools

For similar tasks

For similar jobs