Tianji
从零学习,制作懂人情世故的大语言模型
Stars: 346
Tianji is a free, non-commercial artificial intelligence system developed by SocialAI for tasks involving worldly wisdom, such as etiquette, hospitality, gifting, wishes, communication, awkwardness resolution, and conflict handling. It includes four main technical routes: pure prompt, Agent architecture, knowledge base, and model training. Users can find corresponding source code for these routes in the tianji directory to replicate their own vertical domain AI applications. The project aims to accelerate the penetration of AI into various fields and enhance AI's core competencies.
README:
🍵 在线体验懂人情世故的天机prompt应用、知识库应用、Agent应用
在上海 AILab OpenXLab 在线体验人情世故微调模型:送祝福模块、敬酒礼仪文化
📚 查看 使用文档
🍓 在 🤗 huggingface 获取天机的 所有数据
💡 有疑问或功能请求,欢迎 创建一个 issue ,或者加入我们的 微信社区群
对标 OpenAI 的 SocialAI 首页
🔥🔥News: 2024.09.02
: 我们更新了第一款专注敬酒场景的知识库对话模型
🔥 News: 2024.08.31
: -我们重构了仓库组织结构,更新了相关工具代码以及README。彻底更新了 langchain 知识库问答 相关内容以及对应 demo,让项目更适合一键学习使用。
🔥News: 2024.07.16
: 我们发布了第一款们发布了第一款专注敬酒场景的天机模型, 对应敬酒语料专注敬酒场景的天机模型, 对应敬酒语料
🔥News: 2024.07.14
: 更新了新版的送祝福模块 支持更多风格切换,数据已开源至 huggingface
🔥News: 2024.05.04
: 我们发布了以《化解"尴尬"场合》为例的微调数据获取、制造教程,对应数据开源至 huggingface
🔥News: 2024.05.02
: 我们发布了有关人情世故大模型-送祝福的数据收集到微调过程的全流程可复现文档及其对应数据、配置、辅助脚本,(鸽了一段时间但又回来了,主要维护者持续加班忙了,放假才能快乐开源)
🍵 News: 2024.02.01
: 🧑🚀 我们发布了有关 prompt、Agent应用、知识库(TODO)、模型微调(基于InternLM2)的初版体验地址,将仓库转为开放。
学完全部内容,你将获得大语言模型入门级全栈应用开发能力。
-
制作 AI 游戏(类似哄哄模拟器)应用
-
从零构建一个知识库对话应用
-
从零制作语言模型微调数据、微调属于自己的大语言模型
在本项目中,执行下列指令即可完成项目的安装
pip install -e .
为确保项目正常运行,请在项目内新建.env
文件,并在其中设置你的API密钥,你可以根据下列例子写入对应的 key,即可成功运行调用,目前默认使用 zhipuai,你可以仅写入ZHIPUAI_API_KEY
即可使用。
ZHIPUAI_API_KEY=
如果在从Hugging Face下载模型时遇到速度极慢或无法下载的问题,请在.env文件中设置HF_ENDPOINT
的值为https://hf-mirror.com
。请注意,某些Hugging Face仓库可能需要访问权限(例如Jina Ai)。为此,请注册一个Hugging Face账号,并在.env文件中添加HF_TOKEN
。你可以在这里找到并获取你的token。
HF_HOME='./cache/'
HF_ENDPOINT = 'https://hf-mirror.com'
OPENAI_API_KEY=
OPENAI_API_BASE=
ZHIPUAI_API_KEY=
BAIDU_API_KEY=
OPENAI_API_MODEL=
HF_TOKEN=
以下给出 prompt 以及 agent 的相关应用方式,在运行前请确保你已经新建.env
文件:
# 运行prompt webui前端
python3 run/tianji_prompt_webui.py
# 运行agent前端
streamlit run run/metagpt_webui.py
# 运行langchain前端
python run/demo_rag_langchain_onlinellm.py
在进行项目开发与贡献之前,在保证key的正确设定后,你还需要在提交 pull request 前进行格式检查。你可以参考下列方式进行 pre-commit 的安装,在 commit 环节将会看到变更文件格式会被自动修改。
pip install pre-commit
pre-commit install
git add .
git commit -m "提交信息"
git push
这一步,你需要反复执行下列两步,直到 commit 成功 (该过程会帮助你自动修复绝大部分格式错误,但对于某些复杂格式需要自己手动根据提示修改。)
git add .
git commit -m "提交信息"
若全部成功,你将会看到类似如下信息显示:
[main 2333] rebuild code standard
5 files changed, 4 insertions(+), 3 deletions(-)
- [x] 释放最简初版(涉及prompt、aigame、agent、知识库、模型微调)
- [x] 完成人情世故大模型-送祝福的模型微调数据收集到微调过程的可复现文档
- [x] 开源人情世故语料-送祝福至huggingface
- [x] 迭代更好的数据制造工具与清洗方案,开源数据清洗脚本
- [ ] 完成知识库部分迭代
- [ ] 开放知识库语料获取细节至huggingface
- [ ] 完成 Agent 部分文档
- [ ] 补充文档(如何参考本项目构建自己的应用prompt、agent、知识库、微调应用)
- [ ] 整理多维度数据,开源较完整人情世故语料
- [ ] 收集 & 开源人情世故语料-2w并训练给出结果和全过程
基于整理后的人情世故数据,人情世故大模型系统-天机包括了常见人际交往中的七大领域(具体可以参考 场景分类 中的场景细化细节),其中大体可分为:
1.敬酒礼仪文化 Etiquette
不惧碰杯,酒席桌上一条龙
2.请客礼仪文化 Hospitality
友好地展示你的友好
3.送礼礼仪文化 Gifting
此礼非礼,直击人心
4.送祝福 Wishes
承包你的所有祝福语
5.如何说对话 Communication
据说是低情商救星
6.化解"尴尬"场合 Awkwardness
没心没肺,找回自我
7.矛盾&冲突应对 Conflict
《能屈能伸》
结合这些领域,Tianji涉及到的技术路线共有四种:
- 纯prompt(包括AI游戏):内置 system prompt 基于大模型自身能力对话。
- Agent(MetaGPT等):利用 Agent 架构的得到更丰富、更定制化详细的回答。
- 知识库:直接检索人情世故法则(比如餐桌上一般怎么喝酒)。
- 模型训练:基于不同优秀的模型基座,在积累大量数据的情况下进行Lora微调或全量微调。
您可以在 tianji 目录下找到四种路线的对应源码,如果您想参考 Tianji
的项目架构、数据管理、技术路线复刻出属于自己的垂直领域 AI 应用,欢迎 fork 或者直接参考,我们将会开源所有包括从项目的起步、数据的方向探索、数据构建与管理、AI应用从0制作、领域(比如人情世故)与技术路线的深入结合
的全过程;我们希望看到 AI 原生应用在生活中进一步的加速推进。
assets/:静态图片文件
run/: 包括了各类演示用前端
test/:这里存放了各类功能的测试文件,包括核心模块以及llm运行的单元测试
tianji/:源代码目录,包含主要逻辑与算法实现
tools/:帮助收集数据、整理数据清洗语料的工具
TODO,待覆盖全文档
当前祝福模块支持三种模式触发:
- 普通风格,直接写祝福对象,符合常见对话规律。
- 小红书风格,在祝福对象指定后加上 小红书风格,可触发;自带emoji以及可爱对话风格。
- 严肃风格,在祝福对象指定后加上 严肃风格,可触发;文书风格。
以下给出 特斯拉创始人 Elon Musk 向 OpenAI 创始人 Sam Altman 敬酒的对话示例(基于内置提示词生成):
elon musk:sam,你的Openai不如叫做CloseAI
sam altman:你先管好你的grok,罚你三杯!
elon musk: 瞧你说的,是要我喝完这瓶白酒?
sam altman:哈哈,埃隆,你可是以星辰大海为目标的男人,一瓶白酒算什么?我可是听说你在SpaceX的派对上连火箭燃料都敢喝呢!来,这杯我干了,你随意,但别失了世界首富的风度啊!
elon musk:你这是在激我啊,Sam老兄!那我就不客气了,我喝掉,你半杯。不过说回来,OpenAI的成果可不只是靠吹牛,你可得好好看着,别让它真的变成了“CloseAI”。
sam altman:嘿,我这CEO的帽子可是随时准备摘下来给你戴戴的,马斯克老兄。不过今晚,让我们先忘了这些,来,为我们的友谊干杯!我喝掉,你随意,但记得,明天还得继续改变世界呢!
elon musk:行,就为你这句“改变世界”,我干了这杯!不过别以为我会轻易放过你,等我下次带上我的特斯拉充电桩,看你怎么应对!
sam altman:哈哈,我等着那一天!不过现在,让我们先享受今晚。来,最后一杯,我干了,你随意,喝得开心,别忘了,OpenAI的AI们可还在等着我们回去继续训练呢!
TODO
这里记录了该项目存在的一些问题(欢迎领取解决),以及未来的一些路线图需要完成,当前还没时间写,如果您有时间想一起来玩儿,请issue或者直接邮箱联系我: [email protected]
得益于良好的ci设施,你只需要参考示例PR,就可以很快提出自己的第一个prompt PR!
提交PR后,新的prompt将自动合并于 tianji/prompt
下的json文件中,方便一键调用。如果你不知道写什么,可以参考 场景分类 中的各类场景细化细节,写出不同人情世故领域的prompt。
该项目的初衷,第一是为了让AI学会核心技术
,第二是让更多人(领域/行业)可以构建属于自己的AI系统,加速AI对每一个领域的渗透。你可以通过以下方式来学习该项目:
你可以参考本项目创造出新的垂直领域应用:
- 租房助手(agent)
- 带娃助手(数据收集与知识库)
- 生活指南(数据收集与知识库) ......
有些贡献者没有github,我们感谢其中的每一位贡献者!,也欢迎你一起加入!
感谢下列所有人对本项目的帮助:
- 所有贡献者
- 项目最开始时刻 智谱AI 的token支持!
- 上海人工智能实验室 InternLM(书生·浦语) 模型,以及提供的A100显卡资源!
- InternLM(书生·浦语) 系列开源教程(目前最好的LLM实战全栈教程之一)
- Datawhale 开源学习社区
- 奇想星球
- zRzRzR的帮助
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Tianji
Similar Open Source Tools
Tianji
Tianji is a free, non-commercial artificial intelligence system developed by SocialAI for tasks involving worldly wisdom, such as etiquette, hospitality, gifting, wishes, communication, awkwardness resolution, and conflict handling. It includes four main technical routes: pure prompt, Agent architecture, knowledge base, and model training. Users can find corresponding source code for these routes in the tianji directory to replicate their own vertical domain AI applications. The project aims to accelerate the penetration of AI into various fields and enhance AI's core competencies.
MoneyPrinterTurbo
MoneyPrinterTurbo is a tool that can automatically generate video content based on a provided theme or keyword. It can create video scripts, materials, subtitles, and background music, and then compile them into a high-definition short video. The tool features a web interface and an API interface, supporting AI-generated video scripts, customizable scripts, multiple HD video sizes, batch video generation, customizable video segment duration, multilingual video scripts, multiple voice synthesis options, subtitle generation with font customization, background music selection, access to high-definition and copyright-free video materials, and integration with various AI models like OpenAI, moonshot, Azure, and more. The tool aims to simplify the video creation process and offers future plans to enhance voice synthesis, add video transition effects, provide more video material sources, offer video length options, include free network proxies, enable real-time voice and music previews, support additional voice synthesis services, and facilitate automatic uploads to YouTube platform.
Code-Interpreter-Api
Code Interpreter API is a project that combines a scheduling center with a sandbox environment, dedicated to creating the world's best code interpreter. It aims to provide a secure, reliable API interface for remotely running code and obtaining execution results, accelerating the development of various AI agents, and being a boon to many AI enthusiasts. The project innovatively combines Docker container technology to achieve secure isolation and execution of Python code. Additionally, the project supports storing generated image data in a PostgreSQL database and accessing it through API endpoints, providing rich data processing and storage capabilities.
chatgpt-web-sea
ChatGPT Web Sea is an open-source project based on ChatGPT-web for secondary development. It supports all models that comply with the OpenAI interface standard, allows for model selection, configuration, and extension, and is compatible with OneAPI. The tool includes a Chinese ChatGPT tuning guide, supports file uploads, and provides model configuration options. Users can interact with the tool through a web interface, configure models, and perform tasks such as model selection, API key management, and chat interface setup. The project also offers Docker deployment options and instructions for manual packaging.
MINI_LLM
This project is a personal implementation and reproduction of a small-parameter Chinese LLM. It mainly refers to these two open source projects: https://github.com/charent/Phi2-mini-Chinese and https://github.com/DLLXW/baby-llama2-chinese. It includes the complete process of pre-training, SFT instruction fine-tuning, DPO, and PPO (to be done). I hope to share it with everyone and hope that everyone can work together to improve it!
hugging-llm
HuggingLLM is a project that aims to introduce ChatGPT to a wider audience, particularly those interested in using the technology to create new products or applications. The project focuses on providing practical guidance on how to use ChatGPT-related APIs to create new features and applications. It also includes detailed background information and system design introductions for relevant tasks, as well as example code and implementation processes. The project is designed for individuals with some programming experience who are interested in using ChatGPT for practical applications, and it encourages users to experiment and create their own applications and demos.
chatgpt-webui
ChatGPT WebUI is a user-friendly web graphical interface for various LLMs like ChatGPT, providing simplified features such as core ChatGPT conversation and document retrieval dialogues. It has been optimized for better RAG retrieval accuracy and supports various search engines. Users can deploy local language models easily and interact with different LLMs like GPT-4, Azure OpenAI, and more. The tool offers powerful functionalities like GPT4 API configuration, system prompt setup for role-playing, and basic conversation features. It also provides a history of conversations, customization options, and a seamless user experience with themes, dark mode, and PWA installation support.
GitHubSentinel
GitHub Sentinel is an intelligent information retrieval and high-value content mining AI Agent designed for the era of large models (LLMs). It is aimed at users who need frequent and large-scale information retrieval, especially open source enthusiasts, individual developers, and investors. The main features include subscription management, update retrieval, notification system, report generation, multi-model support, scheduled tasks, graphical interface, containerization, continuous integration, and the ability to track and analyze the latest dynamics of GitHub open source projects and expand to other information channels like Hacker News for comprehensive information mining and analysis capabilities.
gzm-design
Gzm Design is a free and open-source poster designer developed using the latest mainstream technologies such as Vue3, Vite4, TypeScript, etc. It provides features like PSD import, JSON import, multiple pages support, shortcut key support, template import, layer management, ruler tool, pen tool, element editing, preview, file download, canvas zooming and dragging, border stroke, filling, blending modes, text formatting, group handling, canvas size modification, rich text support, masking, shadow effects, undo/redo functionality, QR code tool, barcode tool, and ruler line npm package encapsulation.
airda
airda(Air Data Agent) is a multi-agent system for data analysis, which can understand data development and data analysis requirements, understand data, and generate SQL and Python code for data query, data visualization, machine learning and other tasks.
paper-ai
Paper-ai is a tool that helps you write papers using artificial intelligence. It provides features such as AI writing assistance, reference searching, and editing and formatting tools. With Paper-ai, you can quickly and easily create high-quality papers.
wechat-bot
WeChat Bot is a simple and easy-to-use WeChat robot based on chatgpt and wechaty. It can help you automatically reply to WeChat messages or manage WeChat groups/friends. The tool requires configuration of AI services such as Xunfei, Kimi, or ChatGPT. Users can customize the tool to automatically reply to group or private chat messages based on predefined conditions. The tool supports running in Docker for easy deployment and provides a convenient way to interact with various AI services for WeChat automation.
SQLAgent
DataAgent is a multi-agent system for data analysis, capable of understanding data development and data analysis requirements, understanding data, and generating SQL and Python code for tasks such as data query, data visualization, and machine learning.
MoneyPrinterPlus
MoneyPrinterPlus is a project designed to help users easily make money in the era of short videos. It leverages AI big model technology to batch generate various short videos, perform video editing, and automatically publish videos to popular platforms like Douyin, Kuaishou, Xiaohongshu, and Video Number. The tool covers a wide range of functionalities including integrating with major AI big model tools, supporting various voice types, offering video transition effects, enabling customization of subtitles, and more. It aims to simplify the process of creating and sharing videos to monetize traffic.
meet-libai
The 'meet-libai' project aims to promote and popularize the cultural heritage of the Chinese poet Li Bai by constructing a knowledge graph of Li Bai and training a professional AI intelligent body using large models. The project includes features such as data preprocessing, knowledge graph construction, question-answering system development, and visualization exploration of the graph structure. It also provides code implementations for large models and RAG retrieval enhancement.
LangChain-SearXNG
LangChain-SearXNG is an open-source AI search engine built on LangChain and SearXNG. It supports faster and more accurate search and question-answering functionalities. Users can deploy SearXNG and set up Python environment to run LangChain-SearXNG. The tool integrates AI models like OpenAI and ZhipuAI for search queries. It offers two search modes: Searxng and ZhipuWebSearch, allowing users to control the search workflow based on input parameters. LangChain-SearXNG v2 version enhances response speed and content quality compared to the previous version, providing a detailed configuration guide and showcasing the effectiveness of different search modes through comparisons.
For similar tasks
Tianji
Tianji is a free, non-commercial artificial intelligence system developed by SocialAI for tasks involving worldly wisdom, such as etiquette, hospitality, gifting, wishes, communication, awkwardness resolution, and conflict handling. It includes four main technical routes: pure prompt, Agent architecture, knowledge base, and model training. Users can find corresponding source code for these routes in the tianji directory to replicate their own vertical domain AI applications. The project aims to accelerate the penetration of AI into various fields and enhance AI's core competencies.
For similar jobs
ChatFAQ
ChatFAQ is an open-source comprehensive platform for creating a wide variety of chatbots: generic ones, business-trained, or even capable of redirecting requests to human operators. It includes a specialized NLP/NLG engine based on a RAG architecture and customized chat widgets, ensuring a tailored experience for users and avoiding vendor lock-in.
anything-llm
AnythingLLM is a full-stack application that enables you to turn any document, resource, or piece of content into context that any LLM can use as references during chatting. This application allows you to pick and choose which LLM or Vector Database you want to use as well as supporting multi-user management and permissions.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
deep-chat
Deep Chat is a fully customizable AI chat component that can be injected into your website with minimal to no effort. Whether you want to create a chatbot that leverages popular APIs such as ChatGPT or connect to your own custom service, this component can do it all! Explore deepchat.dev to view all of the available features, how to use them, examples and more!
Avalonia-Assistant
Avalonia-Assistant is an open-source desktop intelligent assistant that aims to provide a user-friendly interactive experience based on the Avalonia UI framework and the integration of Semantic Kernel with OpenAI or other large LLM models. By utilizing Avalonia-Assistant, you can perform various desktop operations through text or voice commands, enhancing your productivity and daily office experience.
chatgpt-web
ChatGPT Web is a web application that provides access to the ChatGPT API. It offers two non-official methods to interact with ChatGPT: through the ChatGPTAPI (using the `gpt-3.5-turbo-0301` model) or through the ChatGPTUnofficialProxyAPI (using a web access token). The ChatGPTAPI method is more reliable but requires an OpenAI API key, while the ChatGPTUnofficialProxyAPI method is free but less reliable. The application includes features such as user registration and login, synchronization of conversation history, customization of API keys and sensitive words, and management of users and keys. It also provides a user interface for interacting with ChatGPT and supports multiple languages and themes.
tiledesk-dashboard
Tiledesk is an open-source live chat platform with integrated chatbots written in Node.js and Express. It is designed to be a multi-channel platform for web, Android, and iOS, and it can be used to increase sales or provide post-sales customer service. Tiledesk's chatbot technology allows for automation of conversations, and it also provides APIs and webhooks for connecting external applications. Additionally, it offers a marketplace for apps and features such as CRM, ticketing, and data export.
UFO
UFO is a UI-focused dual-agent framework to fulfill user requests on Windows OS by seamlessly navigating and operating within individual or spanning multiple applications.