cria

cria

Run LLMs locally with as little friction as possible.

Stars: 105

Visit
 screenshot

Cria is a Python library designed for running Large Language Models with minimal configuration. It provides an easy and concise way to interact with LLMs, offering advanced features such as custom models, streams, message history management, and running multiple models in parallel. Cria simplifies the process of using LLMs by providing a straightforward API that requires only a few lines of code to get started. It also handles model installation automatically, making it efficient and user-friendly for various natural language processing tasks.

README:

cria

Cria, use Python to run LLMs with as little friction as possible.

Cria is a library for programmatically running Large Language Models through Python. Cria is built so you need as little configuration as possible — even with more advanced features.

  • Easy: No configuration is required out of the box. Getting started takes just five lines of code.
  • Concise: Write less code to save time and avoid duplication.
  • Local: Free and unobstructed by rate limits, running LLMs requires no internet connection.
  • Efficient: Use advanced features with your own ollama instance, or a subprocess.

Guide

Quickstart

Running Cria is easy. After installation, you need just five lines of code — no configurations, no manual downloads, no API keys, and no servers to worry about.

import cria

ai = cria.Cria()

prompt = "Who is the CEO of OpenAI?"
for chunk in ai.chat(prompt):
    print(chunk, end="")
>>> The CEO of OpenAI is Sam Altman!

or, you can run this more configurable example.

import cria

with cria.Model() as ai:
  prompt = "Who is the CEO of OpenAI?"
  response = ai.chat(prompt, stream=False)
  print(response)
>>> The CEO of OpenAI is Sam Altman!

[!WARNING] If no model is configured, Cria automatically installs and runs the default model: llama3:8b (4.7GB).

Installation

  1. Cria uses ollama, to install it, run the following.

    Windows

    Download

    Mac

    Download

    Linux

    curl -fsSL https://ollama.com/install.sh | sh
    
  2. Install Cria with pip.

    pip install cria
    

Advanced Usage

Custom Models

To run other LLMs, pass them into your ai variable.

import cria

ai = cria.Cria("llama2")

prompt = "Who is the CEO of OpenAI?"
for chunk in ai.chat(prompt):
    print(chunk, end="") # The CEO of OpenAI is Sam Altman. He co-founded OpenAI in 2015 with...

You can find available models here.

Streams

Streams are used by default in Cria, but you can turn them off by passing in a boolean for the stream parameter.

prompt = "Who is the CEO of OpenAI?"
response = ai.chat(prompt, stream=False)
print(response) # The CEO of OpenAI is Sam Altman!

Closing

By default, models are closed when you exit the Python program, but closing them manually is a best practice.

ai.close()

You can also use with statements to close models automatically (recommended).

Message History

Follow-Up

Message history is automatically saved in Cria, so asking follow-up questions is easy.

prompt = "Who is the CEO of OpenAI?"
response = ai.chat(prompt, stream=False)
print(response) # The CEO of OpenAI is Sam Altman.

prompt = "Tell me more about him."
response = ai.chat(prompt, stream=False)
print(response) # Sam Altman is an American entrepreneur and technologist who serves as the CEO of OpenAI...

Clear Message History

You can reset message history by running the clear method.

prompt = "Who is the CEO of OpenAI?"
response = ai.chat(prompt, stream=False)
print(response) # Sam Altman is an American entrepreneur and technologist who serves as the CEO of OpenAI...

ai.clear()

prompt = "Tell me more about him."
response = ai.chat(prompt, stream=False)
print(response) # I apologize, but I don't have any information about "him" because the conversation just started...

Passing In Custom Context

You can also create a custom message history, and pass in your own context.

context = "Our AI system employed a hybrid approach combining reinforcement learning and generative adversarial networks (GANs) to optimize the decision-making..."
messages = [
    {"role": "system", "content": "You are a technical documentation writer"},
    {"role": "user", "content": context},
]

prompt = "Write some documentation using the text I gave you."
for chunk in ai.chat(messages=messages, prompt=prompt):
    print(chunk, end="") # AI System Optimization: Hybrid Approach Combining Reinforcement Learning and...

In the example, instructions are given to the LLM as the system. Then, extra context is given as the user. Finally, the prompt is entered (as a user). You can use any mixture of roles to specify the LLM to your liking.

The available roles for messages are:

  • user - Pass prompts as the user.
  • system - Give instructions as the system.
  • assistant - Act as the AI assistant yourself, and give the LLM lines.

The prompt parameter will always be appended to messages under the user role, to override this, you can choose to pass in nothing for prompt.

Interrupting

With Message History

If you are streaming messages with Cria, you can interrupt the prompt mid way.

response = ""
max_token_length = 5

prompt = "Who is the CEO of OpenAI?"
for i, chunk in enumerate(ai.chat(prompt)):
  if i >= max_token_length:
    ai.stop()
  response += chunk

print(response) # The CEO of OpenAI is
response = ""
max_token_length = 5

prompt = "Who is the CEO of OpenAI?"
for i, chunk in enumerate(ai.generate(prompt)):
  if i >= max_token_length:
    ai.stop()
  response += chunk

print(response) # The CEO of OpenAI is

In the examples, after the AI generates five tokens (units of text that are usually a couple of characters long), text generation is stopped via the stop method. After stop is called, you can safely break out of the for loop.

Without Message History

By default, Cria automatically saves responses in message history, even if the stream is interrupted. To prevent this behaviour though, you can pass in the allow_interruption boolean.

ai = cria.Cria(allow_interruption=False)

response = ""
max_token_length = 5

prompt = "Who is the CEO of OpenAI?"
for i, chunk in enumerate(ai.chat(prompt)):

  if i >= max_token_length:
    ai.stop()
    break

  print(chunk, end="") # The CEO of OpenAI is

prompt = "Tell me more about him."
for chunk in ai.chat(prompt):
  print(chunk, end="") # I apologize, but I don't have any information about "him" because the conversation just started...

Multiple Models and Parallel Conversations

Models

If you are running multiple models or parallel conversations, the Model class is also available. This is recommended for most use cases.

import cria

ai = cria.Model()

prompt = "Who is the CEO of OpenAI?"
response = ai.chat(prompt, stream=False)
print(response) # The CEO of OpenAI is Sam Altman.

All methods that apply to the Cria class also apply to Model.

With Model

Multiple models can be run through a with statement. This automatically closes them after use.

import cria

prompt = "Who is the CEO of OpenAI?"

with cria.Model("llama3") as ai:
  response = ai.chat(prompt, stream=False)
  print(response) # OpenAI's CEO is Sam Altman, who also...

with cria.Model("llama2") as ai:
  response = ai.chat(prompt, stream=False)
  print(response) # The CEO of OpenAI is Sam Altman.

Standalone Model

Or, models can be run traditionally.

import cria


prompt = "Who is the CEO of OpenAI?"

llama3 = cria.Model("llama3")
response = llama3.chat(prompt, stream=False)
print(response) # OpenAI's CEO is Sam Altman, who also...

llama2 = cria.Model("llama2")
response = llama2.chat(prompt, stream=False)
print(response) # The CEO of OpenAI is Sam Altman.

# Not required, but best practice.
llama3.close()
llama2.close()

Generate

Cria also has a generate method.

prompt = "Who is the CEO of OpenAI?"
for chunk in ai.generate(prompt):
    print(chunk, end="") # The CEO of OpenAI (Open-source Artificial Intelligence) is Sam Altman.

promt = "Tell me more about him."
response = ai.generate(prompt, stream=False)
print(response) # I apologize, but I think there may have been some confusion earlier. As this...

Running Standalone

When you run cria.Cria(), an ollama instance will start up if one is not already running. When the program exits, this instance will terminate.

However, if you want to save resources by not exiting ollama, either run your own ollama instance in another terminal, or run a managed subprocess.

Running Your Own Ollama Instance

ollama serve
prompt = "Who is the CEO of OpenAI?"
with cria.Model() as ai:
    response = ai.generate("Who is the CEO of OpenAI?", stream=False)
    print(response)

Running A Managed Subprocess (Reccomended)

# If it is the first time you start the program, ollama will start automatically
# If it is the second time (or subsequent times) you run the program, ollama will already be running

ai = cria.Cria(standalone=True, close_on_exit=False)
prompt = "Who is the CEO of OpenAI?"

with cria.Model("llama2") as llama2:
    response = llama2.generate("Who is the CEO of OpenAI?", stream=False)
    print(response)

with cria.Model("llama3") as llama3:
    response = llama3.generate("Who is the CEO of OpenAI?", stream=False)
    print(response)

quit()
# Despite exiting, olama will keep running, and be used the next time this program starts.

Formatting

To format the output of the LLM, pass in the format keyword.

ai = cria.Cria()

prompt = "Return a JSON array of AI companies."
response = ai.chat(prompt, stream=False, format="json")
print(response) # ["OpenAI", "Anthropic", "Meta", "Google", "Cohere", ...].

The current supported formats are:

  • JSON

Contributing

If you have a feature request, feel free to make an issue!

Contributions are highly appreciated.

License

MIT

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for cria

Similar Open Source Tools

For similar tasks

For similar jobs