Awesome-AI

收集分享 AI 大型语言模型 (LLM)、AI 辅助编程、AI 绘画等领域的常用资料，探索生成式人工智能的应用与开发。

Stars: 157

Visit

Awesome AI is a repository that collects and shares resources in the fields of large language models (LLM), AI-assisted programming, AI drawing, and more. It explores the application and development of generative artificial intelligence. The repository provides information on various AI tools, models, and platforms, along with tutorials and web products related to AI technologies.

README:

Awesome AI

收集分享 AI 大型语言模型 (LLM)、AI 辅助编程、AI 绘画等领域的常用资料，探索生成式人工智能的应用与开发。

目录

大型语言模型

AI 编程

AI 绘画 / 音频视频创作

常用 AI 网站 / 工具

常见问题

🔍 提醒：善用搜索，按 Ctrl+F 或 ⌘F 定位到你想找的关键词。
💡 持续更新中，建议添加到浏览器收藏夹，平常一定会用到。

大型语言模型

OpenAI GPT / o1 / ChatGPT

简介：OpenAI 的 GPT-4 模型目前全球最先进的大型语言模型。GPT 的原义是 “生成式预训练变换器”。目前全球最火的 AI 应用产品 ChatGPT 就是基于 GPT 模型实现的。
官网：https://openai.com/api/
Playground：（不是免费的，会消耗你的 API 调用额度）
- Chat 模式：https://platform.openai.com/playground?mode=chat
- Assistants 模式：https://platform.openai.com/playground?mode=assistant
API：
- 定价：https://openai.com/pricing
- 官方文档：https://platform.openai.com/docs/overview
- 其他资料：
  - GPT-4o API 实测解析：开发者的福音还是挑战？
  - 扒一扒 OpenAI o1 系列模型：为什么这么强？ | 对开发者有何影响？
Web 产品（ChatGPT）：
- 简介：ChatGPT 是目前全球最热门的 AI 应用产品，是一款能以自然语言对话进行交互的 AI 助手。除了对话之外，它还集成了绘画、代码解释器等功能。2022 年 11 月 30 日上线，2023 年 11 月推出 GPTs（自定义 ChatGPT），2024 年 1 月推出 GPTs 商店，持续引领 AI 领域的热潮。
- 入口：https://chatgpt.com/
- 定价：
  - ChatGPT Free（GPT-3.5 + 语音对话）：免费无限量
  - ChatGPT Plus（Free + GPT-4 + 绘图 + GPTs + ...）：$20 / 月
  - ChatGPT Team（Plus + 协作空间 + 数据保护 + ...）：$25 / 月 / 人
- 相关资料：
Web 产品（GPTs）：
- 简介：GPTs 是 ChatGPT 的拓展。它允许用户针对特定场景定制一个特殊的对话机器人，甚至可以附加自己的知识库或调用外部 API，从而实现比常规 ChatGPT 更高效、更精准的对话效果。2024 年 1 月 GPTs 商店上线，开发者分成计划也呼之欲出。
- GPTs 商店：https://chat.openai.com/gpts
- 相关资料：
  - GPTs 完全指南：入门篇 | 如何开发 | 如何上架 | 如何赚钱
  - GPTs 商店即将开张，坐等赚钱之际，别忘了做好防盗工作
  - 任意 GPTs 资源文件泄露问题探讨
  - SecurityGPT：提示词安全防护

Claude

简介：Anthropic 公司发布的多模态 AI 模型。Claude 的模型规模从小到大分别有 Haiku、Sonnet、Opus 三个版本。
官网：https://www.anthropic.com/claude
Web 产品（Claude）：
- 入口：https://claude.ai/ （需要验证海外手机号）
- 定价：有免费版。专业版套餐 $20 / 月。
相关资料：
- API 文档
- anthropic-cookbook：Anthropic 官方提供的案例演示

Google Gemini

简介：Google 开发的原生多模态大模型。根据模型规模分为 Ultra、Pro、Nano 三个版本。
官网：https://ai.google.dev/
Playground：
- Google AI Studio：https://makersuite.google.com/
- Vertex AI Studio：https://console.cloud.google.com/vertex-ai/generative/multimodal/create/text
API 文档：https://ai.google.dev/tutorials/rest_quickstart
Web 产品（原 Bard 已更名为 Gemini）：https://gemini.google.com/app

百度文心大模型 / 文心一言

简介：文心一言是由百度公司开发的聊天机器人，于 2023 年 3 月 16 日发布。文心一言由文心大模型驱动。
官网：https://wenxin.baidu.com/
API 文档：https://cloud.baidu.com/doc/WENXINWORKSHOP/s/clntwmv7t
Web 产品（文心一言）：https://yiyan.baidu.com/
Web 产品（飞桨 AI 应用中心）：https://aistudio.baidu.com/application/center

智谱 GLM / ChatGLM（智谱清言）

简介：清华智谱团队推出的大模型。有开源版本，可私有化部署。
官网：https://models.aminer.cn/glm-130b/
API 文档：https://open.bigmodel.cn/dev/api
Web 产品（智谱清言）：https://chatglm.cn/main/detail
Web 产品（GLMs）：https://chatglm.cn/glms

Moonshot AI / Kimi

简介：月之暗面推出的大模型。其特点为支持 20 万字的超长上下文。已开放 API。
官网：https://www.moonshot.cn/
API 文档：https://platform.moonshot.cn/
- 定价：https://platform.moonshot.cn/pricing
Web 产品（Kimi 智能助手，原 Kimi Chat）：https://kimi.moonshot.cn/
更多介绍：

更多国产大模型：
- 通义千问 / Qwen：阿里云出品的大模型，有开源版本。
  - Web 产品：https://tongyi.aliyun.com/qianwen/
  - App 产品：通义千问 App 提供了问答助手、AI 工具等常规功能外，还提供了通义舞王、涂鸦作画等特色功能。
  - API 文档：https://help.aliyun.com/zh/dashscope/developer-reference/api-details
- 深度求索 / DeepSeek：国产开源多模态大模型，宣称在测试中接近 GPT-4。已开放 API，定价极为低廉。
  - Web 产品：https://chat.deepseek.com/
  - API 文档：https://platform.deepseek.com/docs
  - 更多介绍：国产大模型又出黑马！DeepSeek 初体验，价格屠夫大杀四方
- 零一万物 / Yi：国产开源多模态大模型。30 万字超长上下文。已开放 API。
  - Web 产品（万知）：https://www.wanzhi.com/ （长文总结要点、文档生成 PPT 等）
  - API 文档：https://platform.lingyiwanwu.com/
- 讯飞星火认知大模型：
  - Web 产品（SparkDesk）：https://xinghuo.xfyun.cn/desk
  - API 文档：https://www.xfyun.cn/doc/spark/Web.html
- MiniMax
  - Web 产品（海螺 AI）：https://hailuoai.com/
  - API 文档：https://api.minimax.chat/document/introduction?id=6433f37594878d408fc82959
- 阶跃星辰：宣称在图像理解、多轮指令跟随、数学能力、逻辑推理、文本创作等方面性能达到业界领先水平。
  - Web 产品（跃问）：https://stepchat.cn/chats/new
  - API 文档：https://platform.stepfun.com/docs/overview/concept
- 面壁智能 / MiniCPM-V：面壁智能出品的端侧多模态大模型系列，接受图像和文本输入，提供高质量的文本输出。可运行于手机、平板等智能终端进行推理。MiniCPM-V 2.6 以极小的参数量实现了 GPT-4V 级别的性能。
  - 体验地址：https://huggingface.co/spaces/openbmb/MiniCPM-V-2_6
- 开源的中文 LLM：https://github.com/HqWu-HITCS/Awesome-Chinese-LLM
图片识别 API：
- GPT-4V：https://platform.openai.com/docs/guides/vision
- Gemini Pro Vision：https://ai.google.dev/tutorials/rest_quickstart#text-and-image_input
人性化的聊天机器人：
- Pi：人性化的 AI 对话助手。
- Hume AI：能识别语音情绪的 AI 模型，提供 API。提供了一个在线演示，可以与 AI 机器人语音对话。
Artificial Analysis：大模型性能多维度测评排行榜。

AI 编程

GitHub Copilot

简介：AI 辅助编程领域的标杆。由 GitHub 和 OpenAI 共同开发，作为编辑器插件集成到开发环境中，支持 VS Code 和 JetBrains IDE 等主流开发工具。它提供代码补全、智能对话、多文件编辑等功能。2024 年底开放免费套餐。
官网：https://github.com/features/copilot
定价：https://github.com/features/copilot/plans
- 免费套餐：功能几乎没有限制，只限制调用次数。
- 专业版：$10 / 月（第一个月免费试用）
推荐书籍：
- 《AI 辅助编程入门：使用 GitHub Copilot 零基础开发 LLM 应用》

Cursor

简介：AI 辅助编程领域的后起之秀。Cursor 是一款编辑器，基于 VS Code 内核。除了基础的代码补全功能以外，它还提供了批量补全、预测下一步操作、智能对话、多文件编辑等高级功能。
官网：https://cursor.com/
定价：https://cursor.com/pricing
- 免费版（两周的专业版试用期 + 2000 次补全 + 有限次对话请求）：免费
- 专业版（无限次补全 + 每月 500 次快速对话请求 + 无限次慢速对话请求）：$20 / 月

其他

AI 编辑器 / 编辑器插件：
- Windsurf：Codeium 出品的一款代码编辑器，基于 VS Code 内核，集成了 Agent 能力，支持各种高级的 AI 辅助编程功能。新用户可获得 2 周的专业版套餐试用期。
- Cline（原名 Claude Dev）：VS Code 插件，融合了 Agent 思维，通过对话生成（或修改）项目代码。需自备 LLM API。已开源。
- CodeGeeX：智谱旗下的智能编程助手。支持 20 多种编程语言，适配 VS Code 和 JetBrains IDE 等主流开发工具。个人用户免费。企业用户可选择私有化部署服务。
- MarsCode：字节跳动豆包大模型旗下的智能编程助手。提供智能补全、智能预测、智能问答等能力，适配 VS Code 和 JetBrains IDE 等主流开发工具。个人用户免费。
- 通义灵码：阿里云出品的编辑器插件，支持 VS Code 和 JetBrains IDE 等。个人用户免费。
- 腾讯云 AI 代码助手：腾讯云出品的编辑器插件。个人用户免费。
- 文心快码：百度出品的编辑器插件。个人用户免费。
- CodeFuse：蚂蚁集团出品的编辑器插件。个人用户免费。
- Codeium：编辑器插件，支持 VS Code 和 JetBrains IDE 等。
  - 定价：个人版（代码建议 + 对话）：免费
- Tabnine：编辑器插件，支持 VS Code 和 JetBrains IDE 等。
  - 定价：基础版（基础的代码补全功能）：免费
- Amazon CodeWhisperer：编辑器插件，支持 VS Code 和 JetBrains IDE 等。
  - 定价：个人版（代码建议 + 参考跟踪 + 安全扫描）：免费
网页设计与生成工具：
- Bolt.new：StackBlitz 出品的 AI 编程工具，可在线生成、编辑、运行、部署全栈网站，一站式解决网站开发需求。支持多种基于 JS 的前后端技术栈。
- v0.dev：Vercel 出品的 AI 网页设计开发工具。通过对话生成网页，可一键发布。主力支持 shadcn/ui (React) + Tailwind 技术栈，后续会支持更多前端技术栈。有免费配额。
- Wegic：AI 驱动的网页 UI 设计和开发工具。通过自然对话快速生成网站，可通过对话持续修改，可一键发布。
- OpenUI：开源项目，AI 自动生成前端代码。可根据描述生成 UI 界面，且可以持续输入描述进行修改，可输出 HTML、React、Vue 组件等格式。支持输入中文描述。在线演示。
其他工具：
- CopyCoder：把网页设计稿、原型图转换成适合 AI 编程工具处理的提示词，适合与 Cursor、Windsurf、Bolt.new、v0.dev 等工具配合使用。
- Devin：Cognition Labs 推出的 AI 编程机器人，有很强的自主学习和工作能力。内测申请排队中，还未正式开放。
- Gru.ai：一款在线的编程助手 AI Agent，根据用户的任务生成代码，支持 Python 和 TS 语言。
推荐书籍：
- 新书《AI 辅助编程 Python 实战》在翻译了，不容错过！

AI 绘画 / 音频视频创作

AI 绘画

Midjourney ：是一款热门的 AI 绘画工具，早期依托 Discord 平台提供服务，现已推出独立网页版应用。它易于入门，作品风格华丽，适合初学者探索 AI 艺术创作，创造独特的视觉作品。
- 定价：
  - 基础版套餐（3.3h Fast Time）：$10 / 月
  - 标准版套餐（15h Fast Time + Unlimited Relax Time）：$30 / 月
  - 专业版套餐（30h Fast Time + Unlimited Relax Time）：$60 / 月
  - 至尊版套餐（60h Fast Time + Unlimited Relax Time）：$120 / 月
OpenAI DALL·E ：OpenAI 出品的图片生成工具。对提示词的理解能力极强，与 ChatGPT 的整合也令它极为易用。
- Web 产品：
  - ChatGPT Plus（GPT-4 + DALL·E 3）：https://chat.openai.com/#pricing
    - 定价：$20 / 月
  - DALL·E 2：https://labs.openai.com/
    - 定价：$15 / 115 点
Stable Diffusion ：是一款先进的人工智能图像生成模型，由 Stability AI 开发。可本地部署。因其开源特性，发展迅速，已经成长为一个庞大的生态，广泛应用于艺术创作、设计和多媒体制作等领域。
- 扩展：
  - ComfyUI：https://github.com/comfyanonymous/ComfyUI
    - 中文学习社区：https://www.comflowy.com/zh-CN
FLUX.1：是一个全新的开源图像生成模型。它由 Black Forest Labs 开发，该团队也是 Stable Diffusion 的幕后团队。

AI 视频生成

Runway Gen-2：视频生成领域的热门产品。它提供了多种 AI 视频生成模型，包括文生视频、图文生成视频、图生视频、风格化渲染、局部叠加渲染、3D 模型渲染等功能。有免费额度。官网
Pika：视频生成领域的热门产品，典型功能为图生视频。新版本主打趣味视频生成功能。有免费额度。
PixVerse：视频生成工具，支持文生视频、图生视频、人物生成视频等。有免费额度。
Stable Video Diffusion：Stability AI 发布的开源的视频生成模型。可本地部署。
- 简介：https://mp.weixin.qq.com/s/il3YahMQyw55KdQ7acxIow
- 教程：https://huggingface.co/docs/diffusers/main/en/using-diffusers/svd
OpenAI Sora：OpenAI 发布的视频生成模型。可实现文生视频、图生视频、视频延长和衔接。生成视频长达一分钟。目前仅面向安全领域和创作领域的专家开放内测，还没有正式开放。
可灵大模型 / 可灵AI：快手出品的视频生成大模型，支持文生视频、图生视频、视频续写等能力。支持最长 3 分钟高清视频生成，堪称 “中国版 Sora”。
智谱清影：智谱 AI 推出的视频生成工具，支持文生视频、图生视频等功能，比如 “老照片动起来” 等应用。可生成 10 秒、4K 分辨率、60 帧高清视频。目前免费开放，支持 API 调用。
Vidu：Vidu 是由清华大学朱军教授团队研发的 AI 视频生成器。支持生成高逼真度的 4 秒或 8 秒 1080p 高清视频。亮点包括生成速度快、角色一致性、支持写实和动漫风格、生成效果连续流畅等，可应用于游戏、影视、教育等领域。
Luma Dream Machine：Luma AI 出品的视频生成模型。支持文生视频、图生视频，可生成 5 秒长度的视频。有免费额度。
剪映专业版（国内版） / CapCut（海外版）：字节跳动出品的视频编辑工具，支持 Windows/Mac。它集成了大量基于 AI 技术的音视频处理功能，比如生成字幕、生成配音、降噪、变声、数字人、文生视频等等。

3D 建模

Zero-1-to-3 (zero123): Zero-shot 单张图片生成 3D 物体。哥伦比亚大学的开源项目。
One-2-3-45：“Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization”。开源项目。
Wonder3D：“Wonder3D produces consistent multi-view normal maps and corresponding color images, and thus reconstructs high-fidelity textured mesh from a single image in only 2~3 minutes”。开源项目。
Stable Zero123：单张图片生成高质量 3D 物体。Stability AI 开源模型，可整合到 ComfyUI 工作流。
DreamGaussian：Generative Gaussian Splatting for Efficient 3D Content Creation。开源项目。
Tripo AI：通过文本或图片生成高质量 3D 模型，可下载。有免费配额。
Genie：Luma AI 出品的一款通过文本生成 3D 模型的 AI 工具。目前依托 Discord 提供服务。
Luma AI：通过视频来重建 3D 场景。你只需要有一台手持拍摄设备，按要求对物体进行 360°旋转拍摄。有网页版和 iOS App。

数字人 / 语音驱动视频 / TTS

TTS：Text to Speech，文本生成语音。

Fish Audio：多功能 AI 音频工具，支持中英等语言，提供文本转语音（TTS）和语音转文本（ASR）等功能。
万兴播爆：输入文案，一键生成数字人播报视频。
剪映：可生成数字人播报视频。
HeyGen：生成数字人播报视频，视频翻译，语音克隆。
Eleven Labs：文本转语音（TTS），通过文字描述生成音效，语音克隆，视频翻译配音。有 API。
EMO：阿里发布的（图片 + 音频 → 视频）大模型，生成的人物嘴形和表情相当自然。
ChatTTS：开源的文本转语音（TTS）模型，支持中文和英文。可以控制停顿和笑声等人性化特征，生成结果自然流畅。
ChatTTS webUI：一个简单的本地网页界面，使用 ChatTTS 将文字合成为语音，同时支持对外提供 API 接口。已开源。
Seed-TTS：字节跳动发布的高质量、多功能语音生成模型，未开源，似乎也没有发布可用的产品。支持音色微调、带情绪的 TTS、音色转换、情绪转换、基于原语音生成新语音、语音内容编辑、调速等能力。可应用于有声读物、翻译视频等场景。
剪映：已上线 AI 克隆音色的功能，只能克隆自己的音色。
微软 Azure AI 语音：微软的云服务，支持语音转文本、文本转语音、语音翻译和说话人辨识等功能。
LivePortrait：快手开源的人物肖像控制模型，可通过五官视频驱动图片生成视频，或修改其他视频。在线体验。
ReSyncer：一个研究项目，通过语音和视频模板素材，生成更自然的唇形视频。可应用于数字人场景。目前只有研究论文，还没有公开可用的产品。
MaskGCT：香港中文大学（深圳）与趣丸科技合力打造的最新一代语音克隆模型，已开源，具备零样本 TTS 能力（只需 1 秒声音样本即可克隆）。在线试用。
Ultralight-Digital-Human：一个超轻量级、可以在移动端实时运行的数字人模型，已开源。
clone-voice：一个带 web 界面的声音克隆工具，使用你的音色或任意声音来录制音频。支持 Windows、Mac 和 Linux。已开源。

在线 SD 绘画

Leonardo.Ai：易用且强大的 AI 绘图平台，底层基于 SD，深度集成 SD 各种插件，提供训练好的模型，可训练模型。有免费配额。
eSheep 电子羊：体验在线的 WebUI 和 ComfyUI。新用户获得 100 积分。每 100 积分相当于 ¥1。
网易 AI 设计工坊：在线 WebUI，可训练模型。有免费配额，每天 10 次。
LibLib AI：在线 WebUI，可训练模型。有免费配额，每天 300 积分。
Cephalon Cloud 端脑云：云端一键部署自己的 WebUI 和 ComfyUI。新用户获得 2000 积分。每 1000 积分相当于 ¥1。
即梦 AI：字节跳动出品的在线绘画平台，主打易用和免费。原名 “Dreamina”。

音乐歌曲创作

Suno：AI 根据你的要求生成歌曲（作词、作曲、演唱）。
天工 AI 音乐：根据歌名、歌词、参考音频创作音乐，可由 AI 帮写歌词。

音频视频处理

vocal-separate：一个极简的人声和背景音乐分离工具，本地化网页操作，无需连接外网。已开源。
pyVideoTrans：开源视频翻译软件，一键字幕生成 + 字幕翻译 + 创建配音 + 合成 = 带字幕和配音的新视频。
GVS 硬字幕提取：智能识别视频硬字幕，快速提取，支持中英文。

其他创作工具

神采 / PromeAI：图片生成和编辑工具，包括草图渲染、照片转线稿、局部重绘、抠图去背景、换背景、扩图、重打光、高清放大、文字融合、AI 写真、图生视频等。有免费配额。
DomoAI：视频风格转绘，比如把一段现有视频转换为二次元日漫、像素风、油画、3D 皮克斯动画等风格。此外还有文生图、图生图、图生视频等功能。有免费配额。
Comic AI：AI 漫画制作工具。有免费配额。
Logo Diffusion：AI 设计 Logo 工具，主要功能有：提示词生成 Logo、图片转 Logo、用提示词修改 Logo、手绘草稿美化、2D 转 3D。有免费配额。
AutoPod：Premiere Pro 插件，自动完成多机位剪辑、停顿切除。
Canva 可画：老牌的在线设计工具，提供了大量模板和设计元素，支持 AI 设计。
FaceSwap：多功能的在线换脸工具。支持图片换脸、多人换脸、视频换脸等功能。有免费配额。
Remaker - Face Swap Online Free：免费的在线换脸工具，支持图片换脸。
绘蛙：面向电商领域的 AI 模特换装生图工具。阿里出品。
OOTDiffusion：一个开源的 “试衣” 模型。在线演示。
TryOffDiff：一个开源的 “脱衣” 模型，从图片中提取服装标准图。在线演示。

（持续更新中……）

常用 AI 网站 / 工具

综合平台

POE：各种知名模型的聚合平台。用户可根据自己的需求通过 Prompt 定制对话机器人，相当于 GPTs 平替。付费用户可无限量使用 GPT-4、Claude 2 等高端模型。
FlowGPT：一个 Prompt 分享平台。提供了角色聊天、游戏、创意、生产力等各种类型的对话机器人，是一个学习提示词的好地方。也可以把它当作 GPTs 平替。
Character.AI：与各种类型的 AI 虚拟角色对话，包括世界名人、动漫人物、游戏角色等。
Coze：字节跳动推出的 AI 聊天机器人及 AI 应用开发平台。无论是否具备编程基础，用户都可以快速构建特定功能的聊天机器人，并发布到各大社交平台。
- 官网（海外版）：https://www.coze.com/
- 官网（国内版）：https://www.coze.cn/
- 中文文档（海外版）：https://www.coze.com/docs/zh_cn/welcome.html
- 中文文档（国内版）：https://www.coze.cn/docs/guides/welcome
GPTsCopilot：第三方 GPTs 商店，提供 GPTs 中转访问服务。在 GPTs 的网址中，把 openai.com 改成 openai-now.com 就可以切换到 GPTsCopilot 提供的中转访问服务，无需成为 ChatGPT Plus 会员即可使用 GPTs。
- 定价：https://gptscopilot.ai/pricing
  - 基础版（每天 5 积分）：免费
  - 专业版（每月 1500 积分）：$9.99 / 月
  - 按需付费方案：$5.99 / 500 积分或 $9.99 / 1000 积分
Toolify.ai：AI 工具分类导航目录。
There's An AI For That (TAAFT)：提问我的需求可以用哪些 AI 工具来实现。
通往 AGI 之路 / WayToAGI：AI 工具（包含网站和 GPTs）分类导航目录，也可搜索。
ChandlerAi：国内可用的 AI 助手，可调用 GPT-4、Claude 3 Opus、Gemini、DALL·E 等先进模型。需要付费。

图像处理 / 图形设计 / UI 设计

Vectorizer.AI：基于 AI 的位图转矢量图的在线工具，比如 PNG → SVG。已经不可免费使用，需要订阅（$10 / 月）。
Galileo：通过提示词生成 UI 设计稿，可导出到 Figma。
Magnific AI：图片放大，增强细节。
Photoroom：在线 AI 图片处理工具。免费用户可使用抠图、擦除、照片优化功能。
抠图：
- 四款免费的 AI 抠图工具，最后一个不敢相信！

写作 / PPT 幻灯片

Notion AI：Notion 中的 AI 写作助手。增值服务，每月 $10。
蛙蛙写作：国产写作模型，写长篇小说、视频脚本、论文等。免费试用 3000 字。
讯飞智文：Word、PPT 一键生成、AI 撰写助手、多语种互译、AI 自动配图、PPT 转演讲稿等。
腾讯文档：AI 助手提供生成 PPT、生成文档、生成表格、生成思维导图、生成收集表等功能。
Gamma：AI 设计助手，生成 PPT、文档和网页，优化现有 PPT 和文档。
AiPPT.cn：AI 一键生成 PPT。支持自动生成 PPT 大纲文案，文档秒变 PPT，支持多种模板，兼容 pptx 格式。
ProcessOn：老牌在线图表绘制工具，支持流程图、泳道图、思维导图、架构图、建筑平面图等等形式。支持 AI 自动生成图表。

内容分析、识别、提炼

通义听悟：语音文件识别为文本，拆分章节，提炼关键信息，识别多人发言，适合处理录音采访、播客、会议记录等内容。
MinerU：一站式、开源、高质量的数据提取工具，包含 PDF 文档、网页与电子书的内容提取并转换等 Markdown 等功能。在线体验
Elicit：以超人速度分析论文。自动化耗时的研究任务，如总结论文、提取数据和汇总结论。
Monica：多功能 AI 工具箱，有浏览器插件、桌面应用和移动 App。最为人熟知的功能是基于网页内容的总结和问答。
剪映：可识别语音生成字幕。

定制知识库 / RAG

RAG：Retrieval-Augmented Generation，检索增强生成。是目前基于 LLM 实现 “外挂知识库” 的主流技术方案。

SiteGPT：基于你的官网内容和上传文档创建客服机器人，解答客户的咨询。
Dify：LLM 应用开发平台，支持各种大模型，提供 Prompt 编排、RAG、Agent 框架、工作流编排等功能。
RAGFlow：一款基于深度文档理解构建的开源 RAG 引擎。
MaxKB：基于 LLM 大语言模型的知识库问答系统。开箱即用，支持快速嵌入到第三方业务系统。

广告 / 营销

AdIntelli：面向 GPTs 生态的广告联盟（可靠性待验证）。
GPT Wallets：为 GPTs 提供支付和数据分析解决方案。

API 聚合平台

GitHub Models：GitHub 提供的免费 LLM API，包含 GPT-4o、Meta Llama 3、Cohere 等模型，调用频率有限制。相关攻略
API2D：提供 GPT 系列、Claude、嵌入、绘图等 API，高速稳定，支付便捷。
OpenRouter：提供 GPT、Claude、Gemini、Llama、Qwen 等系列 API 服务。

其他工具 / 开源项目

Perplexity：AI 搜索。
秘塔 AI 搜索：更友好的搜索引擎，帮你整理信息后更有条理地呈现。有 “全网” 和 “学术” 两种搜索模式，搜索深度可选简洁、深入、研究。
天工 AI：AI 搜索、对话助手等。
Devv.AI：一款面向程序员的 AI 搜索引擎，为编程问题提供快速准确的答案。
PromptPerfect：帮你优化提示词（Prompt）。比如把你的笼统需求拆解为多步骤的任务，以提高模型输出的准确性；又比如根据给定的文章主题生成大纲并撰写长文。
AppAgent：腾讯研究团队的开源项目，基于大语言模型的手机端多模态智能代理，帮用户自动执行复杂任务。适用于 Android 手机和模拟器。可粗略类比为手机端的按键精灵。
ProctorAI：AI 监工，号称 “拖延症终结者”。开源项目，本地应用程序，通过定时截图来检查你是否在摸鱼，并发出警告，支持语音提醒。可以设置详细的监督规则。底层调用了 GPT-4o 等多模态模型。
llm_aided_ocr：LLM 辅助 OCR，通过大模型来提升 OCR 的准确性。

教程

法规 / 公告

中华人民共和国人工智能法（学者建议稿）
算法备案：
- 境内深度合成服务算法备案清单：
- 互联网信息服务算法备案系统

（持续更新中……）

常见问题

有什么简便的方式可以使用 GPT-4 和 GPTs？

建议先通过 ChatGPT Plus 拼车账号开始体验，即买即用。解锁 GPT-4 + DALL·E 绘图 + GPTs 等高端功能。需自备海外线路。

这里推荐一个老牌的拼车平台，稳定可靠。点此开始拼车（还可用九五折优惠码 ai2024）。

AI 生成内容的版权（著作权）属于谁？

简单说一下结论：

如果 AI 服务商的用户协议中主张了 AI 生成内容的著作权，则属于 AI 服务商。
否则属于使用 AI 服务生成内容的用户。

详细解释：你用 AI 生成的作品，版权归你吗？

微信群

加入群，快人一步获取 AI 资讯、与数百名同好交流：

License

Text and graphics: © Creative Commons BY-NC-ND 4.0
Code: GPLv3

For Tasks:

Click tags to check more tools for each tasks

generate text assist in programming create ai-generated art analyze content develop ai applications

For Jobs:

ai researcher data scientist software engineer machine learning engineer artificial intelligence developer

Alternative AI tools for Awesome-AI

Similar Open Source Tools

Awesome-AI

github

: 157

douyin-chatgpt-bot

Douyin ChatGPT Bot is an AI-driven system for automatic replies on Douyin, including comment and private message replies. It offers features such as comment filtering, customizable robot responses, and automated account management. The system aims to enhance user engagement and brand image on the Douyin platform, providing a seamless experience for managing interactions with followers and potential customers.

github

: 166

uDesktopMascot

uDesktopMascot is an open-source project for a desktop mascot application with a theme of 'freedom of creation'. It allows users to load and display VRM or GLB/FBX model files on the desktop, customize GUI colors and background images, and access various features through a menu screen. The application supports Windows 10/11 and macOS platforms.

github

: 265

gez

Gez is a high-performance micro frontend framework based on ESM. It uses Rspack compilation and maps modules to URLs with strong caching and content-based hashing. Gez embraces modern micro frontend architecture by leveraging ESM and importmap for dependency management, providing reliable isolation with module scope, seamless integration with any modern frontend framework, intuitive development experience, and optimal performance with zero runtime overhead and reliable caching strategies.

github

: 584

aituber-kit

AITuber-Kit is a tool that enables users to interact with AI characters, conduct AITuber live streams, and engage in external integration modes. Users can easily converse with AI characters using various LLM APIs, stream on YouTube with AI character reactions, and send messages to server apps via WebSocket. The tool provides settings for API keys, character configurations, voice synthesis engines, and more. It supports multiple languages and allows customization of VRM models and background images. AITuber-Kit follows the MIT license and offers guidelines for adding new languages to the project.

github

: 421

llm-resource

llm-resource is a comprehensive collection of high-quality resources for Large Language Models (LLM). It covers various aspects of LLM including algorithms, training, fine-tuning, alignment, inference, data engineering, compression, evaluation, prompt engineering, AI frameworks, AI basics, AI infrastructure, AI compilers, LLM application development, LLM operations, AI systems, and practical implementations. The repository aims to gather and share valuable resources related to LLM for the community to benefit from.

github

: 309

Code-Review-GPT-Gitlab

A project that utilizes large models to help with Code Review on Gitlab, aimed at improving development efficiency. The project is customized for Gitlab and is developing a Multi-Agent plugin for collaborative review. It integrates various large models for code security issues and stays updated with the latest Code Review trends. The project architecture is designed to be powerful, flexible, and efficient, with easy integration of different models and high customization for developers.

github

: 452

vpnfast.github.io

VPNFast is a lightweight and fast VPN service provider that offers secure and private internet access. With VPNFast, users can protect their online privacy, bypass geo-restrictions, and secure their internet connection from hackers and snoopers. The service provides high-speed servers in multiple locations worldwide, ensuring a reliable and seamless VPN experience for users. VPNFast is easy to use, with a user-friendly interface and simple setup process. Whether you're browsing the web, streaming content, or accessing sensitive information, VPNFast helps you stay safe and anonymous online.

github

: 80

KubeDoor

KubeDoor is a microservice resource management platform developed using Python and Vue, based on K8S admission control mechanism. It supports unified remote storage, monitoring, alerting, notification, and display for multiple K8S clusters. The platform focuses on resource analysis and control during daily peak hours of microservices, ensuring consistency between resource request rate and actual usage rate.

github

: 272

DocTranslator

github

: 60

sanic-web

Sanic-Web is a lightweight, end-to-end, and easily customizable large model application project built on technologies such as Dify, Ollama & Vllm, Sanic, and Text2SQL. It provides a one-stop solution for developing large model applications, supporting graphical data-driven Q&A using ECharts, handling table-based Q&A with CSV files, and integrating with third-party RAG systems for general knowledge Q&A. As a lightweight framework, Sanic-Web enables rapid iteration and extension to facilitate the quick implementation of large model projects.

github

: 233

MaiMBot

MaiMBot is an intelligent QQ group chat bot based on a large language model. It is developed using the nonebot2 framework, utilizes LLM for conversation abilities, MongoDB for data persistence, and NapCat for QQ protocol support. The bot features keyword-triggered proactive responses, dynamic prompt construction, support for images and message forwarding, typo generation, multiple replies, emotion-based emoji responses, daily schedule generation, user relationship management, knowledge base, and group impressions. Work-in-progress features include personality, group atmosphere, image handling, humor, meme functions, and Minecraft interactions. The tool is in active development with plans for GIF compatibility, mini-program link parsing, bug fixes, documentation improvements, and logic enhancements for emoji sending.

github

: 1.1k

Saber-Translator

github

: 424

FisherAI

FisherAI is a Chrome extension designed to improve learning efficiency. It supports automatic summarization, web and video translation, multi-turn dialogue, and various large language models such as gpt/azure/gemini/deepseek/mistral/groq/yi/moonshot. Users can enjoy flexible and powerful AI tools with FisherAI.

github

: 120

AI-Catalog

AI-Catalog is a curated list of AI tools, platforms, and resources across various domains. It serves as a comprehensive repository for users to discover and explore a wide range of AI applications. The catalog includes tools for tasks such as text-to-image generation, summarization, prompt generation, writing assistance, code assistance, developer tools, low code/no code tools, audio editing, video generation, 3D modeling, search engines, chatbots, email assistants, fun tools, gaming, music generation, presentation tools, website builders, education assistants, autonomous AI agents, photo editing, AI extensions, deep face/deep fake detection, text-to-speech, startup tools, SQL-related AI tools, education tools, and text-to-video conversion.

github

: 361

GoMaxAI-ChatGPT-Midjourney-Pro

GoMaxAI Pro is an AI-powered application for personal, team, and enterprise private operations. It supports various models like ChatGPT, Claude, Gemini, Kimi, Wenxin Yiyuan, Xunfei Xinghuo, Tsinghua Zhipu, Suno-v3.5, and Luma-video. The Pro version offers a new UI interface, member points system, management backend, homepage features, support for various content formats, AI video capabilities, SAAS multi-opening function, bug fixes, and more. It is built using web frontend with Vue3, mobile frontend with Uniapp, management frontend with Vue3, backend with Nodejs, and uses MySQL5.7(+) + Redis for data support. It can be deployed on Linux, Windows, or MacOS, with data storage options including local storage, Aliyun OSS, Tencent Cloud COS, and Chevereto image bed.

github

: 233

For similar tasks

Open-DocLLM

Open-DocLLM is an open-source project that addresses data extraction and processing challenges using OCR and LLM technologies. It consists of two main layers: OCR for reading document content and LLM for extracting specific content in a structured manner. The project offers a larger context window size compared to JP Morgan's DocLLM and integrates tools like Tesseract OCR and Mistral for efficient data analysis. Users can run the models on-premises using LLM studio or Ollama, and the project includes a FastAPI app for testing purposes.

github

: 124

Awesome-AI

github

: 157

Qmedia

QMedia is an open-source multimedia AI content search engine designed specifically for content creators. It provides rich information extraction methods for text, image, and short video content. The tool integrates unstructured text, image, and short video information to build a multimodal RAG content Q&A system. Users can efficiently search for image/text and short video materials, analyze content, provide content sources, and generate customized search results based on user interests and needs. QMedia supports local deployment for offline content search and Q&A for private data. The tool offers features like content cards display, multimodal content RAG search, and pure local multimodal models deployment. Users can deploy different types of models locally, manage language models, feature embedding models, image models, and video models. QMedia aims to spark new ideas for content creation and share AI content creation concepts in an open-source manner.

github

: 537

aws-ai-intelligent-document-processing

This repository is part of Intelligent Document Processing with AWS AI Services workshop. It aims to automate the extraction of information from complex content in various document formats such as insurance claims, mortgages, healthcare claims, contracts, and legal contracts using AWS Machine Learning services like Amazon Textract and Amazon Comprehend. The repository provides hands-on labs to familiarize users with these AI services and build solutions to automate business processes that rely on manual inputs and intervention across different file types and formats.

github

: 124

Scrapegraph-LabLabAI-Hackathon

ScrapeGraphAI is a web scraping Python library that utilizes LangChain, LLM, and direct graph logic to create scraping pipelines. Users can specify the information they want to extract, and the library will handle the extraction process. The tool is designed to simplify web scraping tasks by providing a streamlined and efficient approach to data extraction.

github

: 75

parsera

Parsera is a lightweight Python library designed for scraping websites using LLMs. It offers simplicity and efficiency by minimizing token usage, enhancing speed, and reducing costs. Users can easily set up and run the tool to extract specific elements from web pages, generating JSON output with relevant data. Additionally, Parsera supports integration with various chat models, such as Azure, expanding its functionality and customization options for web scraping tasks.

github

: 1.1k

Scrapegraph-demo

ScrapeGraphAI is a web scraping Python library that utilizes LangChain, LLM, and direct graph logic to create scraping pipelines. Users can specify the information they want to extract, and the library will handle the extraction process. This repository contains an official demo/trial for the ScrapeGraphAI library, showcasing its capabilities in web scraping tasks. The tool is designed to simplify the process of extracting data from websites by providing a user-friendly interface and powerful scraping functionalities.

github

: 76

you2txt

You2Txt is a tool developed for the Vercel + Nvidia 2-hour hackathon that converts any YouTube video into a transcribed .txt file. The project won first place in the hackathon and is hosted at you2txt.com. Due to rate limiting issues with YouTube requests, it is recommended to run the tool locally. The project was created using Next.js, Tailwind, v0, and Claude, and can be built and accessed locally for development purposes.

github

: 71

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675

Awesome-AI

README:

Awesome AI

目录

大型语言模型

OpenAI GPT / o1 / ChatGPT

Claude

Google Gemini

百度文心大模型 / 文心一言

智谱 GLM / ChatGLM（智谱清言）

Moonshot AI / Kimi

更多

AI 编程

GitHub Copilot

Cursor

其他

AI 绘画 / 音频视频创作

AI 绘画

AI 视频生成

3D 建模

数字人 / 语音驱动视频 / TTS

在线 SD 绘画

音乐歌曲创作

音频视频处理

其他创作工具

常用 AI 网站 / 工具

综合平台

图像处理 / 图形设计 / UI 设计

写作 / PPT 幻灯片

内容分析、识别、提炼

定制知识库 / RAG

广告 / 营销

API 聚合平台

其他工具 / 开源项目

教程

法规 / 公告

常见问题

有什么简便的方式可以使用 GPT-4 和 GPTs？

AI 生成内容的版权（著作权）属于谁？

微信群

License

For Tasks:

For Jobs:

Alternative AI tools for Awesome-AI

Similar Open Source Tools

Awesome-AI

douyin-chatgpt-bot

uDesktopMascot

gez

aituber-kit

llm-resource

Code-Review-GPT-Gitlab

vpnfast.github.io

KubeDoor

DocTranslator

sanic-web

MaiMBot

Saber-Translator

FisherAI

AI-Catalog

GoMaxAI-ChatGPT-Midjourney-Pro

For similar tasks

Open-DocLLM

Awesome-AI

Qmedia

aws-ai-intelligent-document-processing

Scrapegraph-LabLabAI-Hackathon

parsera

Scrapegraph-demo

you2txt

For similar jobs

weave

LLMStack

VisionCraft

kaito

PyRIT

tabby

spear

Magick