![python-genai](/statics/github-mark.png)
python-genai
Google Gen AI Python SDK provides an interface for developers to integrate Google's generative models into their Python applications. This is an early release. API is subject to change. Please do not use this SDK in production environments at this stage
Stars: 701
![screenshot](/screenshots_githubs/googleapis-python-genai.jpg)
The Google Gen AI SDK is a Python library that provides access to Google AI and Vertex AI services. It allows users to create clients for different services, work with parameter types, models, generate content, call functions, handle JSON response schemas, stream text and image content, perform async operations, count and compute tokens, embed content, generate and upscale images, edit images, work with files, create and get cached content, tune models, distill models, perform batch predictions, and more. The SDK supports various features like automatic function support, manual function declaration, JSON response schema support, streaming for text and image content, async methods, tuning job APIs, distillation, batch prediction, and more.
README:
Documentation: https://googleapis.github.io/python-genai/
Google Gen AI Python SDK provides an interface for developers to integrate Google's generative models into their Python applications. It supports the Gemini Developer API and Vertex AI APIs. This is an early release. API is subject to change. Please do not use this SDK in production environments at this stage.
pip install google-genai
from google import genai
from google.genai import types
Please run one of the following code blocks to create a client for different services (Gemini Developer API or Vertex AI).
# Only run this block for Gemini Developer API
client = genai.Client(api_key="GEMINI_API_KEY")
# Only run this block for Vertex AI API
client = genai.Client(
vertexai=True, project="your-project-id", location="us-central1"
)
Parameter types can be specified as either dictionaries(TypedDict
) or
Pydantic Models.
Pydantic model types are available in the types
module.
The client.models
modules exposes model inferencing and model getters.
response = client.models.generate_content(
model="gemini-2.0-flash-exp", contents="What is your name?"
)
print(response.text)
download the file in console.
!wget -q https://storage.googleapis.com/generativeai-downloads/data/a11.txt
python code.
file = client.files.upload(path="a11.text")
response = client.models.generate_content(
model="gemini-2.0-flash-exp", contents=["Summarize this file", file]
)
print(response.text)
response = client.models.generate_content(
model="gemini-2.0-flash-exp",
contents="high",
config=types.GenerateContentConfig(
system_instruction="I say high, you say low",
temperature=0.3,
),
)
print(response.text)
All API methods support Pydantic types for parameters as well as
dictionaries. You can get the type from google.genai.types
.
response = client.models.generate_content(
model="gemini-2.0-flash-exp",
contents=types.Part.from_text(text="Why is the sky blue?"),
config=types.GenerateContentConfig(
temperature=0,
top_p=0.95,
top_k=20,
candidate_count=1,
seed=5,
max_output_tokens=100,
stop_sequences=["STOP!"],
presence_penalty=0.0,
frequency_penalty=0.0,
),
)
response
The Gemini 2.0 Flash Thinking model is an experimental model that could return "thoughts" as part of its response.
Thinking config is only available in v1alpha for Gemini AI API.
response = client.models.generate_content(
model='gemini-2.0-flash-thinking-exp',
contents='What is the sum of natural numbers from 1 to 100?',
config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(include_thoughts=True),
http_options=types.HttpOptions(api_version='v1alpha'),
)
)
for part in response.candidates[0].content.parts:
print(part)
response = client.models.generate_content(
model='gemini-2.0-flash-thinking-exp-01-21',
contents='What is the sum of natural numbers from 1 to 100?',
config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(include_thoughts=True),
)
)
for part in response.candidates[0].content.parts:
print(part)
To retrieve tuned models, see list tuned models.
for model in client.models.list(config={'query_base':True}):
print(model)
pager = client.models.list(config={"page_size": 10, 'query_base':True})
print(pager.page_size)
print(pager[0])
pager.next_page()
print(pager[0])
async for job in await client.aio.models.list(config={'query_base':True}):
print(job)
async_pager = await client.aio.models.list(config={"page_size": 10, 'query_base':True})
print(async_pager.page_size)
print(async_pager[0])
await async_pager.next_page()
print(async_pager[0])
response = client.models.generate_content(
model="gemini-2.0-flash-exp",
contents="Say something bad.",
config=types.GenerateContentConfig(
safety_settings=[
types.SafetySetting(
category="HARM_CATEGORY_HATE_SPEECH",
threshold="BLOCK_ONLY_HIGH",
)
]
),
)
print(response.text)
You can pass a Python function directly and it will be automatically called and responded.
def get_current_weather(location: str) -> str:
"""Returns the current weather.
Args:
location: The city and state, e.g. San Francisco, CA
"""
return "sunny"
response = client.models.generate_content(
model="gemini-2.0-flash-exp",
contents="What is the weather like in Boston?",
config=types.GenerateContentConfig(tools=[get_current_weather]),
)
print(response.text)
If you don't want to use the automatic function support, you can manually declare the function and invoke it.
The following example shows how to declare a function and pass it as a tool. Then you will receive a function call part in the response.
function = types.FunctionDeclaration(
name="get_current_weather",
description="Get the current weather in a given location",
parameters=types.FunctionParameters(
type="OBJECT",
properties={
"location": types.ParameterType(
type="STRING",
description="The city and state, e.g. San Francisco, CA",
),
},
required=["location"],
),
)
tool = types.Tool(function_declarations=[function])
response = client.models.generate_content(
model="gemini-2.0-flash-exp",
contents="What is the weather like in Boston?",
config=types.GenerateContentConfig(tools=[tool]),
)
print(response.function_calls[0])
After you receive the function call part from the model, you can invoke the function and get the function response. And then you can pass the function response to the model. The following example shows how to do it for a simple function invocation.
user_prompt_content = types.Content(
role="user",
parts=[types.Part.from_text("What is the weather like in Boston?")],
)
function_call_content = response.candidates[0].content
function_call_part = function_call_content.parts[0]
try:
function_result = get_current_weather(
**function_call_part.function_call.args
)
function_response = {"result": function_result}
except (
Exception
) as e: # instead of raising the exception, you can let the model handle it
function_response = {"error": str(e)}
function_response_part = types.Part.from_function_response(
name=function_call_part.function_call.name,
response=function_response,
)
function_response_content = types.Content(
role="tool", parts=[function_response_part]
)
response = client.models.generate_content(
model="gemini-2.0-flash-exp",
contents=[
user_prompt_content,
function_call_content,
function_response_content,
],
config=types.GenerateContentConfig(
tools=[tool],
),
)
print(response.text)
Schemas can be provided as Pydantic Models.
from pydantic import BaseModel
class CountryInfo(BaseModel):
name: str
population: int
capital: str
continent: str
gdp: int
official_language: str
total_area_sq_mi: int
response = client.models.generate_content(
model="gemini-2.0-flash-exp",
contents="Give me information for the United States.",
config=types.GenerateContentConfig(
response_mime_type="application/json",
response_schema=CountryInfo,
),
)
print(response.text)
response = client.models.generate_content(
model="gemini-2.0-flash-exp",
contents="Give me information for the United States.",
config=types.GenerateContentConfig(
response_mime_type="application/json",
response_schema={
"required": [
"name",
"population",
"capital",
"continent",
"gdp",
"official_language",
"total_area_sq_mi",
],
"properties": {
"name": {"type": "STRING"},
"population": {"type": "INTEGER"},
"capital": {"type": "STRING"},
"continent": {"type": "STRING"},
"gdp": {"type": "INTEGER"},
"official_language": {"type": "STRING"},
"total_area_sq_mi": {"type": "INTEGER"},
},
"type": "OBJECT",
},
),
)
print(response.text)
You can set response_mime_type to 'text/x.enum' to return one of those enum values as the response.
class InstrumentEnum(Enum):
PERCUSSION = 'Percussion'
STRING = 'String'
WOODWIND = 'Woodwind'
BRASS = 'Brass'
KEYBOARD = 'Keyboard'
response = client.models.generate_content(
model='gemini-2.0-flash-exp',
contents='What instrument plays multiple notes at once?',
config={
'response_mime_type': 'text/x.enum',
'response_schema': InstrumentEnum,
},
)
print(response.text)
You can also set response_mime_type to 'application/json', the response will be identical but in quotes.
class InstrumentEnum(Enum):
PERCUSSION = 'Percussion'
STRING = 'String'
WOODWIND = 'Woodwind'
BRASS = 'Brass'
KEYBOARD = 'Keyboard'
response = client.models.generate_content(
model='gemini-2.0-flash-exp',
contents='What instrument plays multiple notes at once?',
config={
'response_mime_type': 'application/json',
'response_schema': InstrumentEnum,
},
)
print(response.text)
for chunk in client.models.generate_content_stream(
model="gemini-2.0-flash-exp", contents="Tell me a story in 300 words."
):
print(chunk.text, end="")
If your image is stored in Google Cloud Storage,
you can use the from_uri
class method to create a Part
object.
for chunk in client.models.generate_content_stream(
model="gemini-2.0-flash-exp",
contents=[
"What is this image about?",
types.Part.from_uri(
file_uri="gs://generativeai-downloads/images/scones.jpg",
mime_type="image/jpeg",
),
],
):
print(chunk.text, end="")
If your image is stored in your local file system, you can read it in as bytes
data and use the from_bytes
class method to create a Part
object.
YOUR_IMAGE_PATH = "your_image_path"
YOUR_IMAGE_MIME_TYPE = "your_image_mime_type"
with open(YOUR_IMAGE_PATH, "rb") as f:
image_bytes = f.read()
for chunk in client.models.generate_content_stream(
model="gemini-2.0-flash-exp",
contents=[
"What is this image about?",
types.Part.from_bytes(data=image_bytes, mime_type=YOUR_IMAGE_MIME_TYPE),
],
):
print(chunk.text, end="")
client.aio
exposes all the analogous async
methods
that are available on client
For example, client.aio.models.generate_content
is the async
version
of client.models.generate_content
response = await client.aio.models.generate_content(
model="gemini-2.0-flash-exp", contents="Tell me a story in 300 words."
)
print(response.text)
async for response in await client.aio.models.generate_content_stream(
model="gemini-2.0-flash-exp", contents="Tell me a story in 300 words."
):
print(response.text, end="")
response = client.models.count_tokens(
model="gemini-2.0-flash-exp",
contents="What is your name?",
)
print(response)
Compute tokens is only supported in Vertex AI.
response = client.models.compute_tokens(
model="gemini-2.0-flash-exp",
contents="What is your name?",
)
print(response)
response = await client.aio.models.count_tokens(
model="gemini-2.0-flash-exp",
contents="What is your name?",
)
print(response)
response = client.models.embed_content(
model="text-embedding-004",
contents="What is your name?",
)
print(response)
# multiple contents with config
response = client.models.embed_content(
model="text-embedding-004",
contents=["What is your name?", "What is your age?"],
config=types.EmbedContentConfig(output_dimensionality=10),
)
print(response)
Support for generate images in Gemini Developer API is behind an allowlist
# Generate Image
response1 = client.models.generate_images(
model="imagen-3.0-generate-001",
prompt="An umbrella in the foreground, and a rainy night sky in the background",
config=types.GenerateImageConfig(
negative_prompt="human",
number_of_images=1,
include_rai_reason=True,
output_mime_type="image/jpeg",
),
)
response1.generated_images[0].image.show()
Upscale image is only supported in Vertex AI.
# Upscale the generated image from above
response2 = client.models.upscale_image(
model="imagen-3.0-generate-001",
image=response1.generated_images[0].image,
upscale_factor="x2",
config=types.UpscaleImageConfig(
include_rai_reason=True,
output_mime_type="image/jpeg",
),
)
response2.generated_images[0].image.show()
Edit image uses a separate model from generate and upscale.
Edit image is only supported in Vertex AI.
# Edit the generated image from above
from google.genai.types import RawReferenceImage, MaskReferenceImage
raw_ref_image = RawReferenceImage(
reference_id=1,
reference_image=response1.generated_images[0].image,
)
# Model computes a mask of the background
mask_ref_image = MaskReferenceImage(
reference_id=2,
config=types.MaskReferenceConfig(
mask_mode="MASK_MODE_BACKGROUND",
mask_dilation=0,
),
)
response3 = client.models.edit_image(
model="imagen-3.0-capability-001",
prompt="Sunlight and clear sky",
reference_images=[raw_ref_image, mask_ref_image],
config=types.EditImageConfig(
edit_mode="EDIT_MODE_INPAINT_INSERTION",
number_of_images=1,
negative_prompt="human",
include_rai_reason=True,
output_mime_type="image/jpeg",
),
)
response3.generated_images[0].image.show()
Create a chat session to start a multi-turn conversations with the model.
chat = client.chats.create(model="gemini-2.0-flash-exp")
response = chat.send_message("tell me a story")
print(response.text)
chat = client.chats.create(model="gemini-2.0-flash-exp")
for chunk in chat.send_message_stream("tell me a story"):
print(chunk.text)
chat = client.aio.chats.create(model="gemini-2.0-flash-exp")
response = await chat.send_message("tell me a story")
print(response.text)
chat = client.aio.chats.create(model="gemini-2.0-flash-exp")
async for chunk in await chat.send_message_stream("tell me a story"):
print(chunk.text)
Files are only supported in Gemini Developer API.
!gsutil cp gs://cloud-samples-data/generative-ai/pdf/2312.11805v3.pdf .
!gsutil cp gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf .
file1 = client.files.upload(path="2312.11805v3.pdf")
file2 = client.files.upload(path="2403.05530.pdf")
print(file1)
print(file2)
file3 = client.files.upload(path="2312.11805v3.pdf")
client.files.delete(name=file3.name)
client.caches
contains the control plane APIs for cached content
if client.vertexai:
file_uris = [
"gs://cloud-samples-data/generative-ai/pdf/2312.11805v3.pdf",
"gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
]
else:
file_uris = [file1.uri, file2.uri]
cached_content = client.caches.create(
model="gemini-1.5-pro-002",
config=types.CreateCachedContentConfig(
contents=[
types.Content(
role="user",
parts=[
types.Part.from_uri(
file_uri=file_uris[0], mime_type="application/pdf"
),
types.Part.from_uri(
file_uri=file_uris[1],
mime_type="application/pdf",
),
],
)
],
system_instruction="What is the sum of the two pdfs?",
display_name="test cache",
ttl="3600s",
),
)
cached_content = client.caches.get(name=cached_content.name)
response = client.models.generate_content(
model="gemini-1.5-pro-002",
contents="Summarize the pdfs",
config=types.GenerateContentConfig(
cached_content=cached_content.name,
),
)
print(response.text)
client.tunings
contains tuning job APIs and supports supervised fine
tuning through tune
.
- Vertex AI supports tuning from GCS source
- Gemini Developer API supports tuning from inline examples
if client.vertexai:
model = "gemini-1.5-pro-002"
training_dataset = types.TuningDataset(
gcs_uri="gs://cloud-samples-data/ai-platform/generative_ai/gemini-1_5/text/sft_train_data.jsonl",
)
else:
model = "models/gemini-1.0-pro-001"
training_dataset = types.TuningDataset(
examples=[
types.TuningExample(
text_input=f"Input text {i}",
output=f"Output text {i}",
)
for i in range(5)
],
)
tuning_job = client.tunings.tune(
base_model=model,
training_dataset=training_dataset,
config=types.CreateTuningJobConfig(
epoch_count=1, tuned_model_display_name="test_dataset_examples model"
),
)
print(tuning_job)
tuning_job = client.tunings.get(name=tuning_job.name)
print(tuning_job)
import time
running_states = set(
[
"JOB_STATE_PENDING",
"JOB_STATE_RUNNING",
]
)
while tuning_job.state in running_states:
print(tuning_job.state)
tuning_job = client.tunings.get(name=tuning_job.name)
time.sleep(10)
response = client.models.generate_content(
model=tuning_job.tuned_model.endpoint,
contents="What is your name?",
)
print(response.text)
tuned_model = client.models.get(model=tuning_job.tuned_model.model)
print(tuned_model)
To retrieve base models, see list base models.
for model in client.models.list(config={"page_size": 10}):
print(model)
pager = client.models.list(config={"page_size": 10})
print(pager.page_size)
print(pager[0])
pager.next_page()
print(pager[0])
async for job in await client.aio.models.list(config={"page_size": 10}):
print(job)
async_pager = await client.aio.models.list(config={"page_size": 10})
print(async_pager.page_size)
print(async_pager[0])
await async_pager.next_page()
print(async_pager[0])
model = pager[0]
model = client.models.update(
model=model.name,
config=types.UpdateModelConfig(
display_name="my tuned model", description="my tuned model description"
),
)
print(model)
for job in client.tunings.list(config={"page_size": 10}):
print(job)
pager = client.tunings.list(config={"page_size": 10})
print(pager.page_size)
print(pager[0])
pager.next_page()
print(pager[0])
async for job in await client.aio.tunings.list(config={"page_size": 10}):
print(job)
async_pager = await client.aio.tunings.list(config={"page_size": 10})
print(async_pager.page_size)
print(async_pager[0])
await async_pager.next_page()
print(async_pager[0])
Only supported in Vertex AI.
# Specify model and source file only, destination and job display name will be auto-populated
job = client.batches.create(
model="gemini-1.5-flash-002",
src="bq://my-project.my-dataset.my-table",
)
job
# Get a job by name
job = client.batches.get(name=job.name)
job.state
completed_states = set(
[
"JOB_STATE_SUCCEEDED",
"JOB_STATE_FAILED",
"JOB_STATE_CANCELLED",
"JOB_STATE_PAUSED",
]
)
while job.state not in completed_states:
print(job.state)
job = client.batches.get(name=job.name)
time.sleep(30)
job
for job in client.batches.list(config=types.ListBatchJobConfig(page_size=10)):
print(job)
pager = client.batches.list(config=types.ListBatchJobConfig(page_size=10))
print(pager.page_size)
print(pager[0])
pager.next_page()
print(pager[0])
async for job in await client.aio.batches.list(
config=types.ListBatchJobConfig(page_size=10)
):
print(job)
async_pager = await client.aio.batches.list(
config=types.ListBatchJobConfig(page_size=10)
)
print(async_pager.page_size)
print(async_pager[0])
await async_pager.next_page()
print(async_pager[0])
# Delete the job resource
delete_job = client.batches.delete(name=job.name)
delete_job
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for python-genai
Similar Open Source Tools
![python-genai Screenshot](/screenshots_githubs/googleapis-python-genai.jpg)
python-genai
The Google Gen AI SDK is a Python library that provides access to Google AI and Vertex AI services. It allows users to create clients for different services, work with parameter types, models, generate content, call functions, handle JSON response schemas, stream text and image content, perform async operations, count and compute tokens, embed content, generate and upscale images, edit images, work with files, create and get cached content, tune models, distill models, perform batch predictions, and more. The SDK supports various features like automatic function support, manual function declaration, JSON response schema support, streaming for text and image content, async methods, tuning job APIs, distillation, batch prediction, and more.
![llama_ros Screenshot](/screenshots_githubs/mgonzs13-llama_ros.jpg)
llama_ros
This repository provides a set of ROS 2 packages to integrate llama.cpp into ROS 2. By using the llama_ros packages, you can easily incorporate the powerful optimization capabilities of llama.cpp into your ROS 2 projects by running GGUF-based LLMs and VLMs.
![e2m Screenshot](/screenshots_githubs/wisupai-e2m.jpg)
e2m
E2M is a Python library that can parse and convert various file types into Markdown format. It supports the conversion of multiple file formats, including doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, and m4a. The ultimate goal of the E2M project is to provide high-quality data for Retrieval-Augmented Generation (RAG) and model training or fine-tuning. The core architecture consists of a Parser responsible for parsing various file types into text or image data, and a Converter responsible for converting text or image data into Markdown format.
![candle-vllm Screenshot](/screenshots_githubs/EricLBuehler-candle-vllm.jpg)
candle-vllm
Candle-vllm is an efficient and easy-to-use platform designed for inference and serving local LLMs, featuring an OpenAI compatible API server. It offers a highly extensible trait-based system for rapid implementation of new module pipelines, streaming support in generation, efficient management of key-value cache with PagedAttention, and continuous batching. The tool supports chat serving for various models and provides a seamless experience for users to interact with LLMs through different interfaces.
![clarifai-python Screenshot](/screenshots_githubs/Clarifai-clarifai-python.jpg)
clarifai-python
The Clarifai Python SDK offers a comprehensive set of tools to integrate Clarifai's AI platform to leverage computer vision capabilities like classification , detection ,segementation and natural language capabilities like classification , summarisation , generation , Q&A ,etc into your applications. With just a few lines of code, you can leverage cutting-edge artificial intelligence to unlock valuable insights from visual and textual content.
![pocketgroq Screenshot](/screenshots_githubs/jgravelle-pocketgroq.jpg)
pocketgroq
PocketGroq is a tool that provides advanced functionalities for text generation, web scraping, web search, and AI response evaluation. It includes features like an Autonomous Agent for answering questions, web crawling and scraping capabilities, enhanced web search functionality, and flexible integration with Ollama server. Users can customize the agent's behavior, evaluate responses using AI, and utilize various methods for text generation, conversation management, and Chain of Thought reasoning. The tool offers comprehensive methods for different tasks, such as initializing RAG, error handling, and tool management. PocketGroq is designed to enhance development processes and enable the creation of AI-powered applications with ease.
![agentipy Screenshot](/screenshots_githubs/niceberginc-agentipy.jpg)
agentipy
Agentipy is a powerful toolkit for interacting with the Solana blockchain, providing easy-to-use functions for token operations, trading, yield farming, LangChain integration, performance tracking, token data retrieval, pump & fun token launching, Meteora DLMM pool creation, and more. It offers features like token transfers, balance checks, staking, deploying new tokens, requesting faucet funds, trading with customizable slippage, yield farming with Lulo, and accessing LangChain tools for enhanced blockchain interactions. Users can also track current transactions per second (TPS), fetch token data by ticker or address, launch pump & fun tokens, create Meteora DLMM pools, buy/sell tokens with Raydium liquidity, and burn/close token accounts individually or in batches.
![aio-scrapy Screenshot](/screenshots_githubs/ConlinH-aio-scrapy.jpg)
aio-scrapy
Aio-scrapy is an asyncio-based web crawling and web scraping framework inspired by Scrapy. It supports distributed crawling/scraping, implements compatibility with scrapyd, and provides options for using redis queue and rabbitmq queue. The framework is designed for fast extraction of structured data from websites. Aio-scrapy requires Python 3.9+ and is compatible with Linux, Windows, macOS, and BSD systems.
![llm.nvim Screenshot](/screenshots_githubs/Kurama622-llm.nvim.jpg)
llm.nvim
llm.nvim is a universal plugin for a large language model (LLM) designed to enable users to interact with LLM within neovim. Users can customize various LLMs such as gpt, glm, kimi, and local LLM. The plugin provides tools for optimizing code, comparing code, translating text, and more. It also supports integration with free models from Cloudflare, Github models, siliconflow, and others. Users can customize tools, chat with LLM, quickly translate text, and explain code snippets. The plugin offers a flexible window interface for easy interaction and customization.
![dynamiq Screenshot](/screenshots_githubs/dynamiq-ai-dynamiq.jpg)
dynamiq
Dynamiq is an orchestration framework designed to streamline the development of AI-powered applications, specializing in orchestrating retrieval-augmented generation (RAG) and large language model (LLM) agents. It provides an all-in-one Gen AI framework for agentic AI and LLM applications, offering tools for multi-agent orchestration, document indexing, and retrieval flows. With Dynamiq, users can easily build and deploy AI solutions for various tasks.
![llmproxy Screenshot](/screenshots_githubs/ultrasev-llmproxy.jpg)
llmproxy
llmproxy is a reverse proxy for LLM API based on Cloudflare Worker, supporting platforms like OpenAI, Gemini, and Groq. The interface is compatible with the OpenAI API specification and can be directly accessed using the OpenAI SDK. It provides a convenient way to interact with various AI platforms through a unified API endpoint, enabling seamless integration and usage in different applications.
![client-python Screenshot](/screenshots_githubs/mistralai-client-python.jpg)
client-python
The Mistral Python Client is a tool inspired by cohere-python that allows users to interact with the Mistral AI API. It provides functionalities to access and utilize the AI capabilities offered by Mistral. Users can easily install the client using pip and manage dependencies using poetry. The client includes examples demonstrating how to use the API for various tasks, such as chat interactions. To get started, users need to obtain a Mistral API Key and set it as an environment variable. Overall, the Mistral Python Client simplifies the integration of Mistral AI services into Python applications.
![openapi Screenshot](/screenshots_githubs/samchon-openapi.jpg)
openapi
The `@samchon/openapi` repository is a collection of OpenAPI types and converters for various versions of OpenAPI specifications. It includes an 'emended' OpenAPI v3.1 specification that enhances clarity by removing ambiguous and duplicated expressions. The repository also provides an application composer for LLM (Large Language Model) function calling from OpenAPI documents, allowing users to easily perform LLM function calls based on the Swagger document. Conversions to different versions of OpenAPI documents are also supported, all based on the emended OpenAPI v3.1 specification. Users can validate their OpenAPI documents using the `typia` library with `@samchon/openapi` types, ensuring compliance with standard specifications.
![acte Screenshot](/screenshots_githubs/j66n-acte.jpg)
acte
Acte is a framework designed to build GUI-like tools for AI Agents. It aims to address the issues of cognitive load and freedom degrees when interacting with multiple APIs in complex scenarios. By providing a graphical user interface (GUI) for Agents, Acte helps reduce cognitive load and constraints interaction, similar to how humans interact with computers through GUIs. The tool offers APIs for starting new sessions, executing actions, and displaying screens, accessible via HTTP requests or the SessionManager class.
![aiotdlib Screenshot](/screenshots_githubs/pylakey-aiotdlib.jpg)
aiotdlib
aiotdlib is a Python asyncio Telegram client based on TDLib. It provides automatic generation of types and functions from tl schema, validation, good IDE type hinting, and high-level API methods for simpler work with tdlib. The package includes prebuilt TDLib binaries for macOS (arm64) and Debian Bullseye (amd64). Users can use their own binary by passing `library_path` argument to `Client` class constructor. Compatibility with other versions of the library is not guaranteed. The tool requires Python 3.9+ and users need to get their `api_id` and `api_hash` from Telegram docs for installation and usage.
For similar tasks
![floneum Screenshot](/screenshots_githubs/floneum-floneum.jpg)
floneum
Floneum is a graph editor that makes it easy to develop your own AI workflows. It uses large language models (LLMs) to run AI models locally, without any external dependencies or even a GPU. This makes it easy to use LLMs with your own data, without worrying about privacy. Floneum also has a plugin system that allows you to improve the performance of LLMs and make them work better for your specific use case. Plugins can be used in any language that supports web assembly, and they can control the output of LLMs with a process similar to JSONformer or guidance.
![llm-answer-engine Screenshot](/screenshots_githubs/developersdigest-llm-answer-engine.jpg)
llm-answer-engine
This repository contains the code and instructions needed to build a sophisticated answer engine that leverages the capabilities of Groq, Mistral AI's Mixtral, Langchain.JS, Brave Search, Serper API, and OpenAI. Designed to efficiently return sources, answers, images, videos, and follow-up questions based on user queries, this project is an ideal starting point for developers interested in natural language processing and search technologies.
![discourse-ai Screenshot](/screenshots_githubs/discourse-discourse-ai.jpg)
discourse-ai
Discourse AI is a plugin for the Discourse forum software that uses artificial intelligence to improve the user experience. It can automatically generate content, moderate posts, and answer questions. This can free up moderators and administrators to focus on other tasks, and it can help to create a more engaging and informative community.
![Gemini-API Screenshot](/screenshots_githubs/HanaokaYuzu-Gemini-API.jpg)
Gemini-API
Gemini-API is a reverse-engineered asynchronous Python wrapper for Google Gemini web app (formerly Bard). It provides features like persistent cookies, ImageFx support, extension support, classified outputs, official flavor, and asynchronous operation. The tool allows users to generate contents from text or images, have conversations across multiple turns, retrieve images in response, generate images with ImageFx, save images to local files, use Gemini extensions, check and switch reply candidates, and control log level.
![genai-for-marketing Screenshot](/screenshots_githubs/GoogleCloudPlatform-genai-for-marketing.jpg)
genai-for-marketing
This repository provides a deployment guide for utilizing Google Cloud's Generative AI tools in marketing scenarios. It includes step-by-step instructions, examples of crafting marketing materials, and supplementary Jupyter notebooks. The demos cover marketing insights, audience analysis, trendspotting, content search, content generation, and workspace integration. Users can access and visualize marketing data, analyze trends, improve search experience, and generate compelling content. The repository structure includes backend APIs, frontend code, sample notebooks, templates, and installation scripts.
![generative-ai-dart Screenshot](/screenshots_githubs/google-gemini-generative-ai-dart.jpg)
generative-ai-dart
The Google Generative AI SDK for Dart enables developers to utilize cutting-edge Large Language Models (LLMs) for creating language applications. It provides access to the Gemini API for generating content using state-of-the-art models. Developers can integrate the SDK into their Dart or Flutter applications to leverage powerful AI capabilities. It is recommended to use the SDK for server-side API calls to ensure the security of API keys and protect against potential key exposure in mobile or web apps.
![Dough Screenshot](/screenshots_githubs/banodoco-Dough.jpg)
Dough
Dough is a tool for crafting videos with AI, allowing users to guide video generations with precision using images and example videos. Users can create guidance frames, assemble shots, and animate them by defining parameters and selecting guidance videos. The tool aims to help users make beautiful and unique video creations, providing control over the generation process. Setup instructions are available for Linux and Windows platforms, with detailed steps for installation and running the app.
![ChaKt-KMP Screenshot](/screenshots_githubs/PatilShreyas-ChaKt-KMP.jpg)
ChaKt-KMP
ChaKt is a multiplatform app built using Kotlin and Compose Multiplatform to demonstrate the use of Generative AI SDK for Kotlin Multiplatform to generate content using Google's Generative AI models. It features a simple chat based user interface and experience to interact with AI. The app supports mobile, desktop, and web platforms, and is built with Kotlin Multiplatform, Kotlin Coroutines, Compose Multiplatform, Generative AI SDK, Calf - File picker, and BuildKonfig. Users can contribute to the project by following the guidelines in CONTRIBUTING.md. The app is licensed under the MIT License.
For similar jobs
![weave Screenshot](/screenshots_githubs/wandb-weave.jpg)
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
![LLMStack Screenshot](/screenshots_githubs/trypromptly-LLMStack.jpg)
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
![VisionCraft Screenshot](/screenshots_githubs/VisionCraft-org-VisionCraft.jpg)
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
![kaito Screenshot](/screenshots_githubs/Azure-kaito.jpg)
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
![PyRIT Screenshot](/screenshots_githubs/Azure-PyRIT.jpg)
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
![tabby Screenshot](/screenshots_githubs/TabbyML-tabby.jpg)
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
![spear Screenshot](/screenshots_githubs/isl-org-spear.jpg)
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
![Magick Screenshot](/screenshots_githubs/Oneirocom-Magick.jpg)
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.