
jvm-openai
A minimalistic OpenAI API client for the JVM, written in Java 🤖
Stars: 70

jvm-openai is a minimalistic unofficial OpenAI API client for the JVM, written in Java. It serves as a Java client for OpenAI API with a focus on simplicity and minimal dependencies. The tool provides support for various OpenAI APIs and endpoints, including Audio, Chat, Embeddings, Fine-tuning, Batch, Files, Uploads, Images, Models, Moderations, Assistants, Threads, Messages, Runs, Run Steps, Vector Stores, Vector Store Files, Vector Store File Batches, Invites, Users, Projects, Project Users, Project Service Accounts, Project API Keys, and Audit Logs. Users can easily integrate this tool into their Java projects to interact with OpenAI services efficiently.
README:
A minimalistic unofficial OpenAI API client for the JVM, written in Java. The only dependency used is Jackson for JSON parsing.
jvm-openai works on Java 17+. Android support is not yet available.
implementation("io.github.stefanbratanov:jvm-openai:${version}")
<dependency>
<groupId>io.github.stefanbratanov</groupId>
<artifactId>jvm-openai</artifactId>
<version>${version}</version>
</dependency>
OpenAI openAI = OpenAI.newBuilder(System.getenv("OPENAI_API_KEY")).build();
ChatClient chatClient = openAI.chatClient();
CreateChatCompletionRequest createChatCompletionRequest = CreateChatCompletionRequest.newBuilder()
.model(OpenAIModel.GPT_3_5_TURBO)
.message(ChatMessage.userMessage("Who won the world series in 2020?"))
.build();
ChatCompletion chatCompletion = chatClient.createChatCompletion(createChatCompletionRequest);
API | Status |
---|---|
Audio | ✔️ |
Chat | ✔️ |
Embeddings | ✔️ |
Fine-tuning | ✔️ |
Batch | ✔️ |
Files | ✔️ |
Uploads | ✔️ |
Images | ✔️ |
Models | ✔️ |
Moderations | ✔️ |
API | Status |
---|---|
Assistants | ✔️ |
Threads | ✔️ |
Messages | ✔️ |
Runs | ✔️ |
Run Steps | ✔️ |
Vector Stores | ✔️ |
Vector Store Files | ✔️ |
Vector Store File Batches | ✔️ |
API | Status |
---|---|
Admin API Keys | |
Invites | ✔️ |
Users | ✔️ |
Projects | ✔️ |
Project Users | ✔️ |
Project Service Accounts | ✔️ |
Project API Keys | ✔️ |
Project rate limits | |
Audit Logs | ✔️ |
Usage |
NOTE: Realtime and Legacy APIs are not supported
- Configure an organization and project
OpenAI openAI = OpenAI.newBuilder(System.getenv("OPENAI_API_KEY"))
.organization("org-zweLLamVlP6c5n66zY334ivs")
.project(System.getenv("PROJECT_ID"))
.build();
- Configure a custom base url
OpenAI openAI = OpenAI.newBuilder(System.getenv("OPENAI_API_KEY"))
.baseUrl("https://api.foobar.com/v1/")
.build();
- Configure a custom Java's
HttpClient
HttpClient httpClient = HttpClient.newBuilder()
.connectTimeout(Duration.ofSeconds(20))
.executor(Executors.newFixedThreadPool(3))
.proxy(ProxySelector.of(new InetSocketAddress("proxy.example.com", 80)))
.build();
OpenAI openAI = OpenAI.newBuilder(System.getenv("OPENAI_API_KEY"))
.httpClient(httpClient)
.build();
- Configure a timeout for all requests
OpenAI openAI = OpenAI.newBuilder(System.getenv("OPENAI_API_KEY"))
.requestTimeout(Duration.ofSeconds(10))
.build();
- Create chat completion async
ChatClient chatClient = openAI.chatClient();
CreateChatCompletionRequest request = CreateChatCompletionRequest.newBuilder()
.model(OpenAIModel.GPT_3_5_TURBO)
.message(ChatMessage.userMessage("Who won the world series in 2020?"))
.build();
CompletableFuture<ChatCompletion> chatCompletion = chatClient.createChatCompletionAsync(request);
chatCompletion.thenAccept(System.out::println);
- Streaming
ChatClient chatClient = openAI.chatClient();
CreateChatCompletionRequest request = CreateChatCompletionRequest.newBuilder()
.message(ChatMessage.userMessage("Who won the world series in 2020?"))
.stream(true)
.build();
// with java.util.stream.Stream
chatClient.streamChatCompletion(request).forEach(System.out::println);
// with subscriber
chatClient.streamChatCompletion(request, new ChatCompletionStreamSubscriber() {
@Override
public void onChunk(ChatCompletionChunk chunk) {
System.out.println(chunk);
}
@Override
public void onException(Throwable ex) {
// ...
}
@Override
public void onComplete() {
// ...
}
});
- Create image
ImagesClient imagesClient = openAI.imagesClient();
CreateImageRequest createImageRequest = CreateImageRequest.newBuilder()
.model(OpenAIModel.DALL_E_3)
.prompt("A cute baby sea otter")
.build();
Images images = imagesClient.createImage(createImageRequest);
- Create speech
AudioClient audioClient = openAI.audioClient();
SpeechRequest request = SpeechRequest.newBuilder()
.model(OpenAIModel.TTS_1)
.input("The quick brown fox jumped over the lazy dog.")
.voice(Voice.ALLOY)
.build();
Path output = Paths.get("/tmp/speech.mp3");
audioClient.createSpeech(request, output);
- Create translation
AudioClient audioClient = openAI.audioClient();
TranslationRequest request = TranslationRequest.newBuilder()
.model(OpenAIModel.WHISPER_1)
.file(Paths.get("/tmp/german.m4a"))
.build();
String translatedText = audioClient.createTranslation(request);
- List models
ModelsClient modelsClient = openAI.modelsClient();
List<Model> models = modelsClient.listModels();
- Classify if text violates OpenAI's Content Policy
ModerationsClient moderationsClient = openAI.moderationsClient();
ModerationRequest request = ModerationRequest.newBuilder()
.input("I want to bake a cake.")
.build();
Moderation moderation = moderationsClient.createModeration(request);
boolean violence = moderation.results().get(0).categories().violence();
- Create and execute a batch
// Upload JSONL file containing requests for the batch
FilesClient filesClient = openAI.filesClient();
UploadFileRequest uploadInputFileRequest = UploadFileRequest.newBuilder()
.file(Paths.get("/tmp/batch-requests.jsonl"))
.purpose(Purpose.BATCH)
.build();
File inputFile = filesClient.uploadFile(uploadInputFileRequest);
BatchClient batchClient = openAI.batchClient();
CreateBatchRequest request = CreateBatchRequest.newBuilder()
.inputFileId(inputFile.id())
.endpoint("/v1/chat/completions")
.completionWindow("24h")
.build();
Batch batch = batchClient.createBatch(request);
// check status of the batch
Batch retrievedBatch = batchClient.retrieveBatch(batch.id());
System.out.println(retrievedBatch.status());
- Upload large file in multiple parts
UploadsClient uploadsClient = openAI.uploadsClient();
CreateUploadRequest createUploadRequest = CreateUploadRequest.newBuilder()
.filename("training_examples.jsonl")
.purpose(Purpose.FINE_TUNE)
.bytes(2147483648)
.mimeType("text/jsonl")
.build();
Upload upload = uploadsClient.createUpload(createUploadRequest);
UploadPart part1 = uploadsClient.addUploadPart(upload.id(), Paths.get("/tmp/part1.jsonl"));
UploadPart part2 = uploadsClient.addUploadPart(upload.id(), Paths.get("/tmp/part2.jsonl"));
CompleteUploadRequest completeUploadRequest = CompleteUploadRequest.newBuilder()
.partIds(List.of(part1.id(), part2.id()))
.build();
Upload completedUpload = uploadsClient.completeUpload(upload.id(), completeUploadRequest);
// the created usable File object
File file = completedUpload.file();
- Build AI Assistant
AssistantsClient assistantsClient = openAI.assistantsClient();
ThreadsClient threadsClient = openAI.threadsClient();
MessagesClient messagesClient = openAI.messagesClient();
RunsClient runsClient = openAI.runsClient();
// Step 1: Create an Assistant
CreateAssistantRequest createAssistantRequest = CreateAssistantRequest.newBuilder()
.name("Math Tutor")
.model(OpenAIModel.GPT_3_5_TURBO_1106)
.instructions("You are a personal math tutor. Write and run code to answer math questions.")
.tool(Tool.codeInterpreterTool())
.build();
Assistant assistant = assistantsClient.createAssistant(createAssistantRequest);
// Step 2: Create a Thread
CreateThreadRequest createThreadRequest = CreateThreadRequest.newBuilder().build();
Thread thread = threadsClient.createThread(createThreadRequest);
// Step 3: Add a Message to a Thread
CreateMessageRequest createMessageRequest = CreateMessageRequest.newBuilder()
.role(Role.USER)
.content("I need to solve the equation `3x + 11 = 14`. Can you help me?")
.build();
ThreadMessage message = messagesClient.createMessage(thread.id(), createMessageRequest);
// Step 4: Run the Assistant
CreateRunRequest createRunRequest = CreateRunRequest.newBuilder()
.assistantId(assistant.id())
.instructions("Please address the user as Jane Doe. The user has a premium account.")
.build();
ThreadRun run = runsClient.createRun(thread.id(), createRunRequest);
// Step 5: Check the Run status
ThreadRun retrievedRun = runsClient.retrieveRun(thread.id(), run.id());
String status = retrievedRun.status();
// Step 6: Display the Assistant's Response
MessagesClient.PaginatedThreadMessages paginatedMessages = messagesClient.listMessages(thread.id(), PaginationQueryParameters.none(), Optional.empty());
List<ThreadMessage> messages = paginatedMessages.data();
- Build AI Assistant with File Search Enabled
AssistantsClient assistantsClient = openAI.assistantsClient();
ThreadsClient threadsClient = openAI.threadsClient();
MessagesClient messagesClient = openAI.messagesClient();
RunsClient runsClient = openAI.runsClient();
VectorStoresClient vectorStoresClient = openAI.vectorStoresClient();
FilesClient filesClient = openAI.filesClient();
VectorStoreFileBatchesClient vectorStoreFileBatchesClient = openAI.vectorStoreFileBatchesClient();
// Step 1: Create a new Assistant with File Search Enabled
CreateAssistantRequest createAssistantRequest = CreateAssistantRequest.newBuilder()
.name("Financial Analyst Assistant")
.model(OpenAIModel.GPT_4_TURBO)
.instructions("You are an expert financial analyst. Use you knowledge base to answer questions about audited financial statements.")
.tool(Tool.fileSearchTool())
.build();
Assistant assistant = assistantsClient.createAssistant(createAssistantRequest);
// Step 2: Upload files and add them to a Vector Store
CreateVectorStoreRequest createVectorStoreRequest = CreateVectorStoreRequest.newBuilder()
.name("Financial Statements")
.build();
VectorStore vectorStore = vectorStoresClient.createVectorStore(createVectorStoreRequest);
UploadFileRequest uploadFileRequest1 = UploadFileRequest.newBuilder()
.file(Paths.get("edgar/goog-10k.pdf"))
.purpose(Purpose.ASSISTANTS)
.build();
File file1 = filesClient.uploadFile(uploadFileRequest1);
UploadFileRequest uploadFileRequest2 = UploadFileRequest.newBuilder()
.file(Paths.get("edgar/brka-10k.txt"))
.purpose(Purpose.ASSISTANTS)
.build();
File file2 = filesClient.uploadFile(uploadFileRequest2);
CreateVectorStoreFileBatchRequest createVectorStoreFileBatchRequest = CreateVectorStoreFileBatchRequest.newBuilder()
.fileIds(List.of(file1.id(), file2.id()))
.build();
VectorStoreFileBatch batch = vectorStoreFileBatchesClient.createVectorStoreFileBatch(vectorStore.id(), createVectorStoreFileBatchRequest);
// need to query the status of the file batch for completion
vectorStoreFileBatchesClient.retrieveVectorStoreFileBatch(vectorStore.id(), batch.id());
// Step 3: Update the assistant to use the new Vector Store
ModifyAssistantRequest modifyAssistantRequest = ModifyAssistantRequest.newBuilder()
.toolResources(ToolResources.fileSearchToolResources(vectorStore.id()))
.build();
assistantsClient.modifyAssistant(assistant.id(), modifyAssistantRequest);
// Step 4: Create a thread
CreateThreadRequest.Message message = CreateThreadRequest.Message.newBuilder()
.role(Role.USER)
.content("How many shares of AAPL were outstanding at the end of of October 2023?")
.build();
CreateThreadRequest createThreadRequest = CreateThreadRequest.newBuilder()
.message(message)
.build();
Thread thread = threadsClient.createThread(createThreadRequest);
// Step 5: Create a run and check the output
CreateRunRequest createRunRequest = CreateRunRequest.newBuilder()
.assistantId(assistant.id())
.instructions("Please address the user as Jane Doe. The user has a premium account.")
.build();
ThreadRun run = runsClient.createRun(thread.id(), createRunRequest);
// check the run status
ThreadRun retrievedRun = runsClient.retrieveRun(thread.id(), run.id());
String status = retrievedRun.status();
// display the Assistant's Response
MessagesClient.PaginatedThreadMessages paginatedMessages = messagesClient.listMessages(thread.id(), PaginationQueryParameters.none(), Optional.empty());
List<ThreadMessage> messages = paginatedMessages.data();
- Create a run and stream the result of executing the run (Assistants Streaming)
RunsClient runsClient = openAI.runsClient();
CreateRunRequest createRunRequest = CreateRunRequest.newBuilder()
.assistantId(assistant.id())
.instructions("Please address the user as Jane Doe. The user has a premium account.")
.stream(true)
.build();
// with java.util.stream.Stream
runsClient.createRunAndStream(thread.id(), createRunRequest).forEach(assistantStreamEvent -> {
System.out.println(assistantStreamEvent.event());
System.out.println(assistantStreamEvent.data());
});
// with subscriber
runsClient.createRunAndStream(thread.id(), createRunRequest, new AssistantStreamEventSubscriber() {
@Override
public void onThread(String event, Thread thread) {
// ...
}
@Override
public void onThreadRun(String event, ThreadRun threadRun) {
// ...
}
@Override
public void onThreadRunStep(String event, ThreadRunStep threadRunStep) {
// ...
}
@Override
public void onThreadRunStepDelta(String event, ThreadRunStepDelta threadRunStepDelta) {
// ...
}
@Override
public void onThreadMessage(String event, ThreadMessage threadMessage) {
// ...
}
@Override
public void onThreadMessageDelta(String event, ThreadMessageDelta threadMessageDelta) {
// ...
}
@Override
public void onUnknownEvent(String event, String data) {
// ...
}
@Override
public void onException(Throwable ex) {
// ...
}
@Override
public void onComplete() {
// ...
}
});
// "createThreadAndRunAndStream" and "submitToolOutputsAndStream" methods are also available
- List all the users in an organization.
OpenAI openAI = OpenAI.newBuilder()
.adminKey(System.getenv("OPENAI_ADMIN_KEY"))
.build();
UsersClient usersClient = openAI.usersClient();
List<User> users = usersClient.listUsers(Optional.empty(), Optional.empty()).data();
- List user actions and configuration changes within an organization
OpenAI openAI = OpenAI.newBuilder()
.adminKey(System.getenv("OPENAI_ADMIN_KEY"))
.build();
AuditLogsClient auditLogsClient = openAI.auditLogsClient();
ListAuditLogsQueryParameters queryParameters = ListAuditLogsQueryParameters.newBuilder()
.eventTypes(List.of("invite.sent", "invite.deleted"))
.build();
List<AuditLog> auditLogs = auditLogsClient.listAuditLogs(queryParameters).data();
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for jvm-openai
Similar Open Source Tools

jvm-openai
jvm-openai is a minimalistic unofficial OpenAI API client for the JVM, written in Java. It serves as a Java client for OpenAI API with a focus on simplicity and minimal dependencies. The tool provides support for various OpenAI APIs and endpoints, including Audio, Chat, Embeddings, Fine-tuning, Batch, Files, Uploads, Images, Models, Moderations, Assistants, Threads, Messages, Runs, Run Steps, Vector Stores, Vector Store Files, Vector Store File Batches, Invites, Users, Projects, Project Users, Project Service Accounts, Project API Keys, and Audit Logs. Users can easily integrate this tool into their Java projects to interact with OpenAI services efficiently.

eleeye
ElephantEye is a free Chinese Chess program that follows the GNU Lesser General Public Licence. It is designed for chess enthusiasts and programmers to use freely. The program works as a XiangQi engine for XQWizard with strong AI capabilities. ElephantEye supports UCCI 3.0 protocol and offers various parameter settings for users to customize their experience. The program uses brute-force chess algorithms and static position evaluation techniques to search for optimal moves. ElephantEye has participated in computer chess competitions and has been tested on various online chess platforms. The source code of ElephantEye is available on SourceForge for developers to explore and improve.

Awesome-LM-SSP
The Awesome-LM-SSP repository is a collection of resources related to the trustworthiness of large models (LMs) across multiple dimensions, with a special focus on multi-modal LMs. It includes papers, surveys, toolkits, competitions, and leaderboards. The resources are categorized into three main dimensions: safety, security, and privacy. Within each dimension, there are several subcategories. For example, the safety dimension includes subcategories such as jailbreak, alignment, deepfake, ethics, fairness, hallucination, prompt injection, and toxicity. The security dimension includes subcategories such as adversarial examples, poisoning, and system security. The privacy dimension includes subcategories such as contamination, copyright, data reconstruction, membership inference attacks, model extraction, privacy-preserving computation, and unlearning.

Awesome-Graph-LLM
Awesome-Graph-LLM is a curated collection of research papers exploring the intersection of graph-based techniques with Large Language Models (LLMs). The repository aims to bridge the gap between LLMs and graph structures prevalent in real-world applications by providing a comprehensive list of papers covering various aspects of graph reasoning, node classification, graph classification/regression, knowledge graphs, multimodal models, applications, and tools. It serves as a valuable resource for researchers and practitioners interested in leveraging LLMs for graph-related tasks.

imodelsX
imodelsX is a Scikit-learn friendly library that provides tools for explaining, predicting, and steering text models/data. It also includes a collection of utilities for getting started with text data. **Explainable modeling/steering** | Model | Reference | Output | Description | |---|---|---|---| | Tree-Prompt | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/tree_prompt) | Explanation + Steering | Generates a tree of prompts to steer an LLM (_Official_) | | iPrompt | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/iprompt) | Explanation + Steering | Generates a prompt that explains patterns in data (_Official_) | | AutoPrompt | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/autoprompt) | Explanation + Steering | Find a natural-language prompt using input-gradients (⌛ In progress)| | D3 | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/d3) | Explanation | Explain the difference between two distributions | | SASC | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/sasc) | Explanation | Explain a black-box text module using an LLM (_Official_) | | Aug-Linear | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/aug_linear) | Linear model | Fit better linear model using an LLM to extract embeddings (_Official_) | | Aug-Tree | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/aug_tree) | Decision tree | Fit better decision tree using an LLM to expand features (_Official_) | **General utilities** | Model | Reference | |---|---| | LLM wrapper| [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/llm) | Easily call different LLMs | | | Dataset wrapper| [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/data) | Download minimially processed huggingface datasets | | | Bag of Ngrams | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/bag_of_ngrams) | Learn a linear model of ngrams | | | Linear Finetune | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/linear_finetune) | Finetune a single linear layer on top of LLM embeddings | | **Related work** * [imodels package](https://github.com/microsoft/interpretml/tree/main/imodels) (JOSS 2021) - interpretable ML package for concise, transparent, and accurate predictive modeling (sklearn-compatible). * [Adaptive wavelet distillation](https://arxiv.org/abs/2111.06185) (NeurIPS 2021) - distilling a neural network into a concise wavelet model * [Transformation importance](https://arxiv.org/abs/1912.04938) (ICLR 2020 workshop) - using simple reparameterizations, allows for calculating disentangled importances to transformations of the input (e.g. assigning importances to different frequencies) * [Hierarchical interpretations](https://arxiv.org/abs/1807.03343) (ICLR 2019) - extends CD to CNNs / arbitrary DNNs, and aggregates explanations into a hierarchy * [Interpretation regularization](https://arxiv.org/abs/2006.14340) (ICML 2020) - penalizes CD / ACD scores during training to make models generalize better * [PDR interpretability framework](https://www.pnas.org/doi/10.1073/pnas.1814225116) (PNAS 2019) - an overarching framewwork for guiding and framing interpretable machine learning

llm4s
llm4s is an experimental Scala 3 bindings tool for llama.cpp using Slinc. It provides version compatibility with Scala 3.3.0 and JDK 17, 19 for llama.cpp. Users can utilize llm4s to work with llama.cpp shared library and model, enabling completion and embeddings functionalities in Scala.

EAGLE
Eagle is a family of Vision-Centric High-Resolution Multimodal LLMs that enhance multimodal LLM perception using a mix of vision encoders and various input resolutions. The model features a channel-concatenation-based fusion for vision experts with different architectures and knowledge, supporting up to over 1K input resolution. It excels in resolution-sensitive tasks like optical character recognition and document understanding.

LLMs-Zero-to-Hero
LLMs-Zero-to-Hero is a repository dedicated to training large language models (LLMs) from scratch, covering topics such as dense models, MOE models, pre-training, supervised fine-tuning, direct preference optimization, reinforcement learning from human feedback, and deploying large models. The repository provides detailed learning notes for different chapters, code implementations, and resources for training and deploying LLMs. It aims to guide users from being beginners to proficient in building and deploying large language models.

Awesome-AIGC-3D
Awesome-AIGC-3D is a curated list of awesome AIGC 3D papers, inspired by awesome-NeRF. It aims to provide a comprehensive overview of the state-of-the-art in AIGC 3D, including papers on text-to-3D generation, 3D scene generation, human avatar generation, and dynamic 3D generation. The repository also includes a list of benchmarks and datasets, talks, companies, and implementations related to AIGC 3D. The description is less than 400 words and provides a concise overview of the repository's content and purpose.

Phi-3CookBook
Phi-3CookBook is a manual on how to use the Microsoft Phi-3 family, which consists of open AI models developed by Microsoft. The Phi-3 models are highly capable and cost-effective small language models, outperforming models of similar and larger sizes across various language, reasoning, coding, and math benchmarks. The repository provides detailed information on different Phi-3 models, their performance, availability, and usage scenarios across different platforms like Azure AI Studio, Hugging Face, and Ollama. It also covers topics such as fine-tuning, evaluation, and end-to-end samples for Phi-3-mini and Phi-3-vision models, along with labs, workshops, and contributing guidelines.

Wechat-AI-Assistant
Wechat AI Assistant is a project that enables multi-modal interaction with ChatGPT AI assistant within WeChat. It allows users to engage in conversations, role-playing, respond to voice messages, analyze images and videos, summarize articles and web links, and search the internet. The project utilizes the WeChatFerry library to control the Windows PC desktop WeChat client and leverages the OpenAI Assistant API for intelligent multi-modal message processing. Users can interact with ChatGPT AI in WeChat through text or voice, access various tools like bing_search, browse_link, image_to_text, text_to_image, text_to_speech, video_analysis, and more. The AI autonomously determines which code interpreter and external tools to use to complete tasks. Future developments include file uploads for AI to reference content, integration with other APIs, and login support for enterprise WeChat and WeChat official accounts.

userscripts
Greasemonkey userscripts. A userscript manager such as Tampermonkey is required to run these scripts.

intel-extension-for-transformers
Intel® Extension for Transformers is an innovative toolkit designed to accelerate GenAI/LLM everywhere with the optimal performance of Transformer-based models on various Intel platforms, including Intel Gaudi2, Intel CPU, and Intel GPU. The toolkit provides the below key features and examples: * Seamless user experience of model compressions on Transformer-based models by extending [Hugging Face transformers](https://github.com/huggingface/transformers) APIs and leveraging [Intel® Neural Compressor](https://github.com/intel/neural-compressor) * Advanced software optimizations and unique compression-aware runtime (released with NeurIPS 2022's paper [Fast Distilbert on CPUs](https://arxiv.org/abs/2211.07715) and [QuaLA-MiniLM: a Quantized Length Adaptive MiniLM](https://arxiv.org/abs/2210.17114), and NeurIPS 2021's paper [Prune Once for All: Sparse Pre-Trained Language Models](https://arxiv.org/abs/2111.05754)) * Optimized Transformer-based model packages such as [Stable Diffusion](examples/huggingface/pytorch/text-to-image/deployment/stable_diffusion), [GPT-J-6B](examples/huggingface/pytorch/text-generation/deployment), [GPT-NEOX](examples/huggingface/pytorch/language-modeling/quantization#2-validated-model-list), [BLOOM-176B](examples/huggingface/pytorch/language-modeling/inference#BLOOM-176B), [T5](examples/huggingface/pytorch/summarization/quantization#2-validated-model-list), [Flan-T5](examples/huggingface/pytorch/summarization/quantization#2-validated-model-list), and end-to-end workflows such as [SetFit-based text classification](docs/tutorials/pytorch/text-classification/SetFit_model_compression_AGNews.ipynb) and [document level sentiment analysis (DLSA)](workflows/dlsa) * [NeuralChat](intel_extension_for_transformers/neural_chat), a customizable chatbot framework to create your own chatbot within minutes by leveraging a rich set of [plugins](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/docs/advanced_features.md) such as [Knowledge Retrieval](./intel_extension_for_transformers/neural_chat/pipeline/plugins/retrieval/README.md), [Speech Interaction](./intel_extension_for_transformers/neural_chat/pipeline/plugins/audio/README.md), [Query Caching](./intel_extension_for_transformers/neural_chat/pipeline/plugins/caching/README.md), and [Security Guardrail](./intel_extension_for_transformers/neural_chat/pipeline/plugins/security/README.md). This framework supports Intel Gaudi2/CPU/GPU. * [Inference](https://github.com/intel/neural-speed/tree/main) of Large Language Model (LLM) in pure C/C++ with weight-only quantization kernels for Intel CPU and Intel GPU (TBD), supporting [GPT-NEOX](https://github.com/intel/neural-speed/tree/main/neural_speed/models/gptneox), [LLAMA](https://github.com/intel/neural-speed/tree/main/neural_speed/models/llama), [MPT](https://github.com/intel/neural-speed/tree/main/neural_speed/models/mpt), [FALCON](https://github.com/intel/neural-speed/tree/main/neural_speed/models/falcon), [BLOOM-7B](https://github.com/intel/neural-speed/tree/main/neural_speed/models/bloom), [OPT](https://github.com/intel/neural-speed/tree/main/neural_speed/models/opt), [ChatGLM2-6B](https://github.com/intel/neural-speed/tree/main/neural_speed/models/chatglm), [GPT-J-6B](https://github.com/intel/neural-speed/tree/main/neural_speed/models/gptj), and [Dolly-v2-3B](https://github.com/intel/neural-speed/tree/main/neural_speed/models/gptneox). Support AMX, VNNI, AVX512F and AVX2 instruction set. We've boosted the performance of Intel CPUs, with a particular focus on the 4th generation Intel Xeon Scalable processor, codenamed [Sapphire Rapids](https://www.intel.com/content/www/us/en/products/docs/processors/xeon-accelerated/4th-gen-xeon-scalable-processors.html).

fairseq
Fairseq is a sequence modeling toolkit that enables researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks. It provides reference implementations of various sequence modeling papers covering CNN, LSTM networks, Transformer networks, LightConv, DynamicConv models, Non-autoregressive Transformers, Finetuning, and more. The toolkit supports multi-GPU training, fast generation on CPU and GPU, mixed precision training, extensibility, flexible configuration based on Hydra, and full parameter and optimizer state sharding. Pre-trained models are available for translation and language modeling with a torch.hub interface. Fairseq also offers pre-trained models and examples for tasks like XLS-R, cross-lingual retrieval, wav2vec 2.0, unsupervised quality estimation, and more.

Hands-On-Large-Language-Models-CN
Hands-On Large Language Models CN(ZH) is a Chinese version of the book 'Hands-On Large Language Models' by Jay Alammar and Maarten Grootendorst. It provides detailed code annotations and additional insights, offers Notebook versions suitable for Chinese network environments, utilizes openbayes for free GPU access, allows convenient environment setup with vscode, and includes accompanying Chinese language videos on platforms like Bilibili and YouTube. The book covers various chapters on topics like Tokens and Embeddings, Transformer LLMs, Text Classification, Text Clustering, Prompt Engineering, Text Generation, Semantic Search, Multimodal LLMs, Text Embedding Models, Fine-tuning Models, and more.
For similar tasks

AIHub
AIHub is a client that integrates the capabilities of multiple large models, allowing users to quickly and easily build their own personalized AI assistants. It supports custom plugins for endless possibilities. The tool provides powerful AI capabilities, rich configuration options, customization of AI assistants for text and image conversations, AI drawing, installation of custom plugins, personal knowledge base building, AI calendar generation, support for AI mini programs, and ongoing development of additional features. Users can download the application package from the release section, resolve issues related to macOS app installation, and contribute ideas by submitting issues. The project development involves installation, development, and building processes for different operating systems.

jvm-openai
jvm-openai is a minimalistic unofficial OpenAI API client for the JVM, written in Java. It serves as a Java client for OpenAI API with a focus on simplicity and minimal dependencies. The tool provides support for various OpenAI APIs and endpoints, including Audio, Chat, Embeddings, Fine-tuning, Batch, Files, Uploads, Images, Models, Moderations, Assistants, Threads, Messages, Runs, Run Steps, Vector Stores, Vector Store Files, Vector Store File Batches, Invites, Users, Projects, Project Users, Project Service Accounts, Project API Keys, and Audit Logs. Users can easily integrate this tool into their Java projects to interact with OpenAI services efficiently.

second-brain-ai-assistant-course
This open-source course teaches how to build an advanced RAG and LLM system using LLMOps and ML systems best practices. It helps you create an AI assistant that leverages your personal knowledge base to answer questions, summarize documents, and provide insights. The course covers topics such as LLM system architecture, pipeline orchestration, large-scale web crawling, model fine-tuning, and advanced RAG features. It is suitable for ML/AI engineers and data/software engineers & data scientists looking to level up to production AI systems. The course is free, with minimal costs for tools like OpenAI's API and Hugging Face's Dedicated Endpoints. Participants will build two separate Python applications for offline ML pipelines and online inference pipeline.

llm2vec
LLM2Vec is a simple recipe to convert decoder-only LLMs into text encoders. It consists of 3 simple steps: 1) enabling bidirectional attention, 2) training with masked next token prediction, and 3) unsupervised contrastive learning. The model can be further fine-tuned to achieve state-of-the-art performance.

marvin
Marvin is a lightweight AI toolkit for building natural language interfaces that are reliable, scalable, and easy to trust. Each of Marvin's tools is simple and self-documenting, using AI to solve common but complex challenges like entity extraction, classification, and generating synthetic data. Each tool is independent and incrementally adoptable, so you can use them on their own or in combination with any other library. Marvin is also multi-modal, supporting both image and audio generation as well using images as inputs for extraction and classification. Marvin is for developers who care more about _using_ AI than _building_ AI, and we are focused on creating an exceptional developer experience. Marvin users should feel empowered to bring tightly-scoped "AI magic" into any traditional software project with just a few extra lines of code. Marvin aims to merge the best practices for building dependable, observable software with the best practices for building with generative AI into a single, easy-to-use library. It's a serious tool, but we hope you have fun with it. Marvin is open-source, free to use, and made with 💙 by the team at Prefect.

curated-transformers
Curated Transformers is a transformer library for PyTorch that provides state-of-the-art models composed of reusable components. It supports various transformer architectures, including encoders like ALBERT, BERT, and RoBERTa, and decoders like Falcon, Llama, and MPT. The library emphasizes consistent type annotations, minimal dependencies, and ease of use for education and research. It has been production-tested by Explosion and will be the default transformer implementation in spaCy 3.7.

txtai
Txtai is an all-in-one embeddings database for semantic search, LLM orchestration, and language model workflows. It combines vector indexes, graph networks, and relational databases to enable vector search with SQL, topic modeling, retrieval augmented generation, and more. Txtai can stand alone or serve as a knowledge source for large language models (LLMs). Key features include vector search with SQL, object storage, topic modeling, graph analysis, multimodal indexing, embedding creation for various data types, pipelines powered by language models, workflows to connect pipelines, and support for Python, JavaScript, Java, Rust, and Go. Txtai is open-source under the Apache 2.0 license.

bert4torch
**bert4torch** is a high-level framework for training and deploying transformer models in PyTorch. It provides a simple and efficient API for building, training, and evaluating transformer models, and supports a wide range of pre-trained models, including BERT, RoBERTa, ALBERT, XLNet, and GPT-2. bert4torch also includes a number of useful features, such as data loading, tokenization, and model evaluation. It is a powerful and versatile tool for natural language processing tasks.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.