meet-libai
李白 :bust_in_silhouette: 作为唐代杰出诗人,其诗歌作品在中国文学史上具有重要地位。近年来,随着数字技术和人工智能的快速发展,传统文化普及推广的形式也面临着创新与变革。国内外对于李白诗歌的研究虽已相当深入,但在数字化、智能化普及方面仍存在不足。因此,本项目旨在通过构建李白知识图谱,结合大模型训练出专业的AI智能体,以生成式对话应用的形式,推动李白文化的普及与推广。
Stars: 1100
The 'meet-libai' project aims to promote and popularize the cultural heritage of the Chinese poet Li Bai by constructing a knowledge graph of Li Bai and training a professional AI intelligent body using large models. The project includes features such as data preprocessing, knowledge graph construction, question-answering system development, and visualization exploration of the graph structure. It also provides code implementations for large models and RAG retrieval enhancement.
README:
李白 👤 作为唐代杰出诗人,其诗歌作品在中国文学史上具有重要地位。近年来,随着数字技术和人工智能的快速发展,传统文化普及推广的形式也面临着创新与变革。国内外对于李白诗歌的研究虽已相当深入,但在数字化、智能化普及方面仍存在不足。因此,本项目旨在通过构建李白知识图谱,结合大模型训练出专业的AI智能体,以生成式对话应用的形式,推动李白文化的普及与推广。
随着人工智能技术的发展,知识图谱技术也得到了广泛的应用。知识图谱是一种基于语言知识库的语义表示模型,它能够将结构化的知识表示为图的形式,从而使得机器能够更好地理解和处理自然语言。 在知识图谱技术的基础上,开发一个问答系统可以利用知识图谱中的知识来回答用户的问题。该系统可以利用知识图谱来构建以诗人李白为核心的古诗词文化知识图谱 🌐 ,并实现基于该知识图谱的问答功能。另外,对图谱进行可视化探索,以更好地理解知识图谱的结构和内容。同时提供,大模型以及rag检索增强的代码实现。
2.1 🥇 收集整理李白诗歌及其相关文化资料:通过文献调研、数据挖掘等方法,全面收集李白的诗歌作品、生平事迹、历史背景等相关资料,为构建李白知识图谱提供基础数据。
2.2 🥈 构建李白知识图谱:利用自然语言处理、信息抽取等技术,对收集到的资料进行整理和分析,构建出一个完整的李白知识图谱。该图谱将涵盖李白的生平、诗歌风格、艺术成就等多个方面,为后续的AI智能体训练提供丰富的知识库。
2.3 🥉 训练专业的AI智能体:基于构建好的李白知识图谱,利用大模型技术训练出具有专业水平的AI智能体。该智能体将具备对李白诗歌的深入理解和鉴赏能力,能够与用户进行高质量的互动。
2.4 4️⃣开发生成式对话应用:在训练好的AI智能体基础上,开发一款生成式对话应用。该应用将能够实现与用户的实时互动,为用户提供个性化的李白诗歌鉴赏体验。
-
Python
-
PyTorch
-
Transformers
-
fastAPI
-
DGL
-
DGL-KE
-
Neo4j
-
AC自动机
-
RAG
-
langchain
-
edge-tts
-
modelscope
-
gradio
-
zhipuai
-
数据预处理:对古诗词数据进行清洗、分词、构建知识图谱
-
知识图谱构建:利用知识图谱技术构建以李白为核心的古诗词文化知识图谱
- 问答系统构建:利用知识图谱中的知识来回答用户的问题
- 图谱可视化:对知识图谱进行可视化探索,以更好地理解知识图谱的结构和内容
- 问答系统构建:利用知识图谱中的知识来回答用户的问题
-
♨️图谱问答思路:
-
😸普通流式问答
-
♻️ 关系型问答:
- 李白和杜甫的关系是什么
-
📦属性问答:
- 李白生于哪一年
-
🎁 生成语音、图像:
-
请生成李白在江边喝酒的图片
-
请生成春望这首诗的语音
-
-
- 构建了一个基于知识图谱的问答系统,该系统能够利用知识图谱中的知识来回答用户的问题。
- 对图谱进行可视化探索,以更好地理解知识图谱的结构和内容。
- 提供大模型以及rag检索增强的代码实现。
- 知识图谱的构建和维护
- 问答系统的实现和优化
- 图谱的可视化探索
- 大模型以及rag检索增强的代码实现
- 进一步优化问答系统的回答质量和效率
- 探索其他类型的问答任务,如常识 问题、知识推理等
- 持续更新和维护知识图谱,以保持其准确性、完整性和有效性
通过本项目的实施,我们不仅实现了基于知识图谱的问答系统,还积累了丰富的实践经验和知识图谱技术应用经验。在未来的工作中,我们将不断优化问答系统的回答质量和效率,并探索其他类型的问答任务,以满足更多用户的需求。同时,我们将继续更新和维护知识图谱,以保持其准确性、完整性和有效性,为知识图谱技术的发展和应用做出贡献 。以下是项目技术架构图:
请点击这里跳转,代码结构🏙️
:key:使用智普ai开放平台,请跳转到该平台,申请api key。然后,将api key填入.env
文件中。
-
使用conda来管理python环境,所以请先安装conda (Install Conda):smile_cat:
-
使用conda命令来创建python环境
#Create a new environment: Use the following command to create a new Python environment with a specific vesion.(当然国内你可能需要配置conda和pip镜像) conda create --name myenv python=3.10 #This will create a new environment named myenv with the specified Python version. #Activate the environment: Once the environment is created, you need to activate it. conda activate myenv
-
安装依赖包
pip install -r requirements.txt
You can start a Neo4j container like this:
docker run \ --publish=7474:7474 --publish=7687:7687 \ --volume=$HOME/neo4j/data:/data \ neo4j:5.12.0which allows you to access neo4j through your browser at http://localhost:7474.
This binds two ports (
7474
and7687
) for HTTP and Bolt access to the Neo4j API. A volume is bound to/data
to allow the database to be persisted outside the container.By default, this requires you to login with
neo4j/neo4j
and change the password. You can, for development purposes, disable authentication by passing--env=NEO4J_AUTH=none
to docker run.
-
当然,也可以不使用docker,直接在你的操作系统上安装neo4j并启动服务即可。
Cypher query
语句如下:
# 创建`李白`节点
CREATE (p:`人物`:`唐`{name: '李白', PersonId:32540})
# 创建‘高力士’节点
CREATE (p:`人物`:`唐`{name: '高力士', PersonId:32541})
# 创建李白和高力士的关系
MATCH (a:`人物`:`唐` {PersonId: 32540}), (b:`人物`:`唐` {PersonId: 32541})
CREATE (a)-[r:`李白得罪高力士` {since: 2022, strength: 'strong', Notes: '《李太白全集》卷三五《李太白年譜》:天寶三載,甲申。(五月改"年"爲"載"。四十四歲)太白在翰林,代草王言。然性嗜酒,多沉飮,有時召令撰述,方在醉中,不可待,左右以水沃面,稍解,卽令秉筆,頃之而成。帝甚才之,數侍宴飮。因沉醉引足令高力士脫靴,力士恥之,因摘其詩句以激太眞妃。帝三欲官白,妃輒沮之。又爲張垍讒譖,公自知不爲親近所容,懇求還山,帝乃賜金放歸。又引《松窗錄》:會高力士終以脫靴爲深恥,異日,太眞妃重吟前詞,力士戲曰:"比以妃子怨李白深入骨髓,何反拳拳如是?"太眞妃驚曰:"何翰林學士能辱人如斯!"力士曰:"以飛燕指妃子,是賤之甚矣!"太眞妃深然之。上嘗三欲命李白官,卒爲宮中所捍而止。'}]->(b)
RETURN r
以上数据导入完毕之后,再导入元数据节点(改节点用于记录数据版本号的基本信息)
CREATE (meta_node:Meta{
id: 'meta-001',
title: 'libai-graph meta node',
text: 'store some meta info',
timestamp: datetime(),
version: 1,
status: 'active'
})
neo4j:
url: bolt://localhost:7687
database: neo4j
username: neo4j
password: *****
# 注意: 以上参数,根据你的数据库实际连接为准
有3个配置文件(根据你的需求,决定使用哪个配置,如果没有对应的配置文件,可以拷贝./config/config-local.yaml作为副本,再修改):
部署环境配置./config/config-deploy.yaml
测试环境配置./config/config-dev.yaml
本地开发配置./config/config-local.yaml
在项目根目录下**新建**.env
文件作为环境变量配置,并在文件中指定启用哪个环境配置,下面给出一个完整的.env
内容
#PY_ENVIRONMENT=dev
PY_ENVIRONMENT=local # 启用本地开发环境
#PY_ENVIRONMENT=deploy
PY_DEBUG=true
# ---------注意-----------------------------------
# 如下模型中只能使用其中的某一个模型,不能同时配置多个模型
# 去对应的官网申请api-key,并替换YOUR API-KEY
# 也可以使用ollama本地运行的模型,api-key设置为ollama
# ⚠️文生图的模型暂时使用zhipuai,因此要配置zhipuai的api-key
# -----------------------------------------------
# 智普ai
LLM_BASE_URL=https://open.bigmodel.cn/api/paas/v4/
LLM_API_KEY=YOUR API-KEY
MODEL_NAME=glm-4
# kimi
#LLM_BASE_URL=https://api.moonshot.cn/v1
#LLM_API_KEY=YOUR API-KEY
#MODEL_NAME=moonshot-v1-8k
# 百川大模型
#LLM_BASE_URL=https://api.baichuan-ai.com/v1/
#LLM_API_KEY=YOUR API-KEY
#MODEL_NAME=Baichuan4
# 通义千问
#LLM_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
#LLM_API_KEY=YOUR API-KEY
#MODEL_NAME=qwen-long
# 零一万物
#LLM_BASE_URL=https://api.lingyiwanwu.com/v1
#LLM_API_KEY=YOUR API-KEY
#MODEL_NAME=yi-large
# deepseek
# LLM_BASE_URL=https://api.deepseek.com
# LLM_API_KEY=ollama
# MODEL_NAME=deepseek-chat
# 豆包
#LLM_BASE_URL=https://ark.cn-beijing.volces.com/api/v3/
#LLM_API_KEY=YOUR API-KEY
# 注意:对于豆包api,model_name参数填入ENDPOINT_ID,具体申请操作在豆包api官网提供。
#MODEL_NAME=
# ollama
#LLM_BASE_URL=http://localhost:11434/v1/
#LLM_API_KEY=ollama
#MODEL_NAME=qwen2:0.5b
#文生图模型,暂时使用zhipuai
#OPENAI_API_KEY=YOUR API-KEY
ZHIPUAI_API_KEY=YOUR API-KEY
# 这里填入你的组织名
ORGANIZATION_NAME= xxx团队
😃由于涉及到内存问题,这个两个服务独立部署。目前暂不开源,感兴趣的读者,可以自己按照如下接口规则独立开发。如果没有这个服务接口,不影响程序运行。
古文搜古文,接口访问示例:
data = {
"text": '床前明月光', # 古文
"conf_key": "chinese-classical", # 预留参数
"group": "default", # 预留参数
"size": 5, # 返回个数
"searcher": 3 # 预留参数
}
resp = requests.post("http://172.16.67.150:18880/api/search/nl", data=json.dumps(data))
接口返回数据示例:
{
"retCode": 0,
"errMsg": null,
"values": [
{
"value": "Ming##@##申佳允##@##天际秋云薄|床前明月光|无由一化羽|回立白苍苍##@##秋兴集古 其八##@##苍苍 天际 秋云 明月",
"score": 1.0000004768371582
},
{
"value": "Tang##@##李白##@##床前明月光|疑是地上霜|举头望山月|低头思故乡##@##静夜思##@##山月 霜 明月 低头",
"score": 1.0000004768371582
},
{
"value": "tang##@##李白##@##床前明月光|疑是地上霜|举头望明月|低头思故乡##@##静夜思##@##霜 光 明月 低头",
"score": 1.0000004768371582
},
{
"value": "Ming##@##高启##@##堂上织流黄|堂前看月光|羞见天孙度|低头入洞房##@##子夜四时歌 其三##@##天孙 月光 洞房 低头",
"score": 0.7958479523658752
},
{
"value": "Ming##@##黄渊耀##@##凉风落柳梢|微云淡河面|怀中明月光|多赊不为贱##@##夜坐##@##凉风 柳梢 明月 微云",
"score": 0.7571470737457275
}
]
}
score表示得分,value表示一条数据,value中的的各个字段值用##@##隔开 ["朝代","作者", "完整诗篇", "篇名", "关键词"]
- [x] 后台启动 🍼
启动shell脚本为restart.sh
> chmod +x ./restart.sh
> ./restart.sh
启动成功后可以访问
-
webui http://localhost:7860
-
api doc http://localhost:18881/redoc
-
[x] python启动所有任务包括api和webui 📻
python app.py
启动成功后可以访问 🔍
-
webui http://localhost:7860
-
api doc http://localhost:18881/redoc
-
[x] python 命令启动 webui 🤹♂️
python webui.py
启动成功后可以访问 📦
- webui http://localhost:7860
访问api: http://localhost:18881/docs 打开如下图所示, 然后点击build model:
接着填写如下如下参数后,点击execute:
1.按照前面的步骤启动程序之后(程序保持运行),运行根目录下的graph_demo_ui.py:
python graph_demo_ui.py
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for meet-libai
Similar Open Source Tools
meet-libai
The 'meet-libai' project aims to promote and popularize the cultural heritage of the Chinese poet Li Bai by constructing a knowledge graph of Li Bai and training a professional AI intelligent body using large models. The project includes features such as data preprocessing, knowledge graph construction, question-answering system development, and visualization exploration of the graph structure. It also provides code implementations for large models and RAG retrieval enhancement.
MINI_LLM
This project is a personal implementation and reproduction of a small-parameter Chinese LLM. It mainly refers to these two open source projects: https://github.com/charent/Phi2-mini-Chinese and https://github.com/DLLXW/baby-llama2-chinese. It includes the complete process of pre-training, SFT instruction fine-tuning, DPO, and PPO (to be done). I hope to share it with everyone and hope that everyone can work together to improve it!
ChatPilot
ChatPilot is a chat agent tool that enables AgentChat conversations, supports Google search, URL conversation (RAG), and code interpreter functionality, replicates Kimi Chat (file, drag and drop; URL, send out), and supports OpenAI/Azure API. It is based on LangChain and implements ReAct and OpenAI Function Call for agent Q&A dialogue. The tool supports various automatic tools such as online search using Google Search API, URL parsing tool, Python code interpreter, and enhanced RAG file Q&A with query rewriting support. It also allows front-end and back-end service separation using Svelte and FastAPI, respectively. Additionally, it supports voice input/output, image generation, user management, permission control, and chat record import/export.
EduChat
EduChat is a large-scale language model-based chatbot system designed for intelligent education by the EduNLP team at East China Normal University. The project focuses on developing a dialogue-based language model for the education vertical domain, integrating diverse education vertical domain data, and providing functions such as automatic question generation, homework correction, emotional support, course guidance, and college entrance examination consultation. The tool aims to serve teachers, students, and parents to achieve personalized, fair, and warm intelligent education.
bce-qianfan-sdk
The Qianfan SDK provides best practices for large model toolchains, allowing AI workflows and AI-native applications to access the Qianfan large model platform elegantly and conveniently. The core capabilities of the SDK include three parts: large model reasoning, large model training, and general and extension: * `Large model reasoning`: Implements interface encapsulation for reasoning of Yuyan (ERNIE-Bot) series, open source large models, etc., supporting dialogue, completion, Embedding, etc. * `Large model training`: Based on platform capabilities, it supports end-to-end large model training process, including training data, fine-tuning/pre-training, and model services. * `General and extension`: General capabilities include common AI development tools such as Prompt/Debug/Client. The extension capability is based on the characteristics of Qianfan to adapt to common middleware frameworks.
metaso-free-api
Metaso AI Free service supports high-speed streaming output, secret tower AI super network search (full network or academic as well as concise, in-depth, research three modes), zero-configuration deployment, multi-token support. Fully compatible with ChatGPT interface. It also has seven other free APIs available for use. The tool provides various deployment options such as Docker, Docker-compose, Render, Vercel, and native deployment. Users can access the tool for chat completions and token live checks. Note: Reverse API is unstable, it is recommended to use the official Metaso AI website to avoid the risk of banning. This project is for research and learning purposes only, not for commercial use.
CareGPT
CareGPT is a medical large language model (LLM) that explores medical data, training, and deployment related research work. It integrates resources, open-source models, rich data, and efficient deployment methods. It supports various medical tasks, including patient diagnosis, medical dialogue, and medical knowledge integration. The model has been fine-tuned on diverse medical datasets to enhance its performance in the healthcare domain.
ERNIE-SDK
ERNIE SDK repository contains two projects: ERNIE Bot Agent and ERNIE Bot. ERNIE Bot Agent is a large model intelligent agent development framework based on the Wenxin large model orchestration capability introduced by Baidu PaddlePaddle, combined with the rich preset platform functions of the PaddlePaddle Star River community. ERNIE Bot provides developers with convenient interfaces to easily call the Wenxin large model for text creation, general conversation, semantic vectors, and AI drawing basic functions.
Streamer-Sales
Streamer-Sales is a large model for live streamers that can explain products based on their characteristics and inspire users to make purchases. It is designed to enhance sales efficiency and user experience, whether for online live sales or offline store promotions. The model can deeply understand product features and create tailored explanations in vivid and precise language, sparking user's desire to purchase. It aims to revolutionize the shopping experience by providing detailed and unique product descriptions to engage users effectively.
deepseek-free-api
DeepSeek Free API is a high-speed streaming output tool that supports multi-turn conversations and zero-configuration deployment. It is compatible with the ChatGPT interface and offers multiple token support. The tool provides eight free APIs for various AI interfaces. Users can access the tool online, prepare for integration, deploy using Docker, Docker-compose, Render, Vercel, or native deployment methods. It also offers client recommendations for faster integration and supports dialogue completion and userToken live checks. The tool comes with important considerations for Nginx reverse proxy optimization and token statistics.
LangChain-SearXNG
LangChain-SearXNG is an open-source AI search engine built on LangChain and SearXNG. It supports faster and more accurate search and question-answering functionalities. Users can deploy SearXNG and set up Python environment to run LangChain-SearXNG. The tool integrates AI models like OpenAI and ZhipuAI for search queries. It offers two search modes: Searxng and ZhipuWebSearch, allowing users to control the search workflow based on input parameters. LangChain-SearXNG v2 version enhances response speed and content quality compared to the previous version, providing a detailed configuration guide and showcasing the effectiveness of different search modes through comparisons.
Senparc.AI
Senparc.AI is an AI extension package for the Senparc ecosystem, focusing on LLM (Large Language Models) interaction. It provides modules for standard interfaces and basic functionalities, as well as interfaces using SemanticKernel for plug-and-play capabilities. The package also includes a library for supporting the 'PromptRange' ecosystem, compatible with various systems and frameworks. Users can configure different AI platforms and models, define AI interface parameters, and run AI functions easily. The package offers examples and commands for dialogue, embedding, and DallE drawing operations.
GitHubSentinel
GitHub Sentinel is an intelligent information retrieval and high-value content mining AI Agent designed for the era of large models (LLMs). It is aimed at users who need frequent and large-scale information retrieval, especially open source enthusiasts, individual developers, and investors. The main features include subscription management, update retrieval, notification system, report generation, multi-model support, scheduled tasks, graphical interface, containerization, continuous integration, and the ability to track and analyze the latest dynamics of GitHub open source projects and expand to other information channels like Hacker News for comprehensive information mining and analysis capabilities.
aituber-kit
AITuber-Kit is a tool that enables users to interact with AI characters, conduct AITuber live streams, and engage in external integration modes. Users can easily converse with AI characters using various LLM APIs, stream on YouTube with AI character reactions, and send messages to server apps via WebSocket. The tool provides settings for API keys, character configurations, voice synthesis engines, and more. It supports multiple languages and allows customization of VRM models and background images. AITuber-Kit follows the MIT license and offers guidelines for adding new languages to the project.
NarratoAI
NarratoAI is an automated video narration tool that provides an all-in-one solution for script writing, automated video editing, voice-over, and subtitle generation. It is powered by LLM to enhance efficient content creation. The tool aims to simplify the process of creating film commentary and editing videos by automating various tasks such as script writing and voice-over generation. NarratoAI offers a user-friendly interface for users to easily generate video scripts, edit videos, and customize video parameters. With future plans to optimize story generation processes and support additional large models, NarratoAI is a versatile tool for content creators looking to streamline their video production workflow.
For similar tasks
meet-libai
The 'meet-libai' project aims to promote and popularize the cultural heritage of the Chinese poet Li Bai by constructing a knowledge graph of Li Bai and training a professional AI intelligent body using large models. The project includes features such as data preprocessing, knowledge graph construction, question-answering system development, and visualization exploration of the graph structure. It also provides code implementations for large models and RAG retrieval enhancement.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.