openmacro
Multimodal Assistant. Human Interface for computers.
Stars: 62
Openmacro is a multimodal personal agent that allows users to run code locally. It acts as a personal agent capable of completing and automating tasks autonomously via self-prompting. The tool provides a CLI natural-language interface for completing and automating tasks, analyzing and plotting data, browsing the web, and manipulating files. Currently, it supports API keys for models powered by SambaNova, with plans to add support for other hosts like OpenAI and Anthropic in future versions.
README:
https://github.com/user-attachments/assets/9360dfeb-a471-49c3-bbdc-72b32cc8eaeb
[!WARNING] DISCLAIMER: Project is in its early stage of development. Current version is not stable.
openmacro is a multimodal personal agent that allows LLMs to run code locally. openmacro aims to act as a personal agent capable of completing and automating simple to complex tasks autonomously via self prompting.
This provides a cli natural-language interface for you to:
- Complete and automate simple to complex tasks.
- Analyse and plot data.
- Browse the web for the latest information.
- Manipulate files including photos, videos, PDFs, etc.
At the moment, openmacro only supports API keys for models powered by SambaNova. Why? Because it’s free, fast, and reliable, which makes it ideal for testing as the project grows! Support for other hosts such as OpenAI and Anthropic is planned to be added in future versions.
This project is heavily inspired by Open Interpreter
❤️
To get started with openmacro, get a free API key by creating an account at https://cloud.sambanova.ai/.
Next, install and start openmacro by running:
pip install openmacro
macro --api_key "YOUR_API_KEY"
[!TIP] Not working? Raise an issue here or try this out instead:
py -m pip install openmacro
py -m openmacro --api_key "YOUR_API_KEY"
[!NOTE] You only need to pass
--api_key
once! Next time simply callmacro
orpy -m openmacro
.
[!TIP] You can also assign different api-keys to different profiles!
py -m openmacro --api_key "YOUR_API_KEY" --profile "path\to\profile"
openmacro supports cli args and customised settings! You can view arg options by running:
macro --help
To add your own personalised settings and save it for the future, run:
macro --profile "path\to\profile"
Openmacro supports custom profiles in JSON
, TOML
, YAML
and Python
:
Python
Profiles in `python` allow direct customisation and type safety!What your profile.py
might look like:
# imports
from openmacro.profile import Profile
from openmacro.extensions import BrowserKwargs, EmailKwargs
# profile setup
profile: Profile = Profile(
user = {
"name": "Amor",
"version": "1.0.0"
},
assistant = {
"name": "Macro",
"personality": "You respond in a professional attitude and respond in a formal, yet casual manner.",
"messages": [],
"breakers": ["the task is done.",
"the conversation is done."]
},
safeguards = {
"timeout": 16,
"auto_run": True,
"auto_install": True
},
extensions = {
# type safe kwargs
"Browser": BrowserKwargs(headless=False, engine="google"),
"Email": EmailKwargs(email="[email protected]", password="password")
},
config = {
"verbose": True,
"conversational": True,
"dev": False
},
languages = {
# specify custom paths to languages or add custom languages for openmacro
"python": ["C:\Windows\py.EXE", "-c"],
"rust": ["cargo", "script", "-e"] # not supported by default, but can be added!
},
tts = {
# powered by KoljaB/RealtimeSTT
# options ["SystemEngine", "GTTSEngine", "OpenAIEngine"]
"enabled": True,
"engine": "OpenAIEngine",
"api_key": "sk-example"
}
)
And can be extended if you want to build your own app with openmacro:
...
async def main():
from openmacro.core import Openmacro
macro = Openmacro(profile)
macro.llm.messages = []
async for chunk in macro.chat("Plot an exponential graph for me!", stream=True):
print(chunk, end="")
import asyncio
asyncio.run(main)
JSON
What your profile.json
might look like:
{
"user": {
"name": "Amor",
"version": "1.0.0"
},
"assistant": {
"name": "Basil",
"personality": "You have a kind, deterministic and professional attitude towards your work and respond in a formal, yet casual manner.",
"messages": [],
"breakers": ["the task is done.", "the conversation is done."]
},
"safeguards": {
"timeout": 16,
"auto_run": true,
"auto_install": true
},
"extensions": {
"Browser": {
"headless": false,
"engine": "google"
},
"Email": {
"email": "[email protected]",
"password": "password"
}
},
"config": {
"verbose": true,
"conversational": true,
"dev": false
},
"languages": {
"python": ["C:\\Windows\\py.EXE", "-c"],
"rust": ["cargo", "script", "-e"]
},
"tts": {
"enabled": true,
"engine": "OpenAIEngine",
"api_key": "sk-example"
}
}
TOML
What your profile.toml
might look like:
[user]
name = "Amor"
version = "1.0.0"
[assistant]
name = "Basil"
personality = "You have a kind, deterministic and professional attitude towards your work and respond in a formal, yet casual manner."
messages = []
breakers = ["the task is done.", "the conversation is done."]
[safeguards]
timeout = 16
auto_run = true
auto_install = true
[extensions.Browser]
headless = false
engine = "google"
[extensions.Email]
email = "[email protected]"
password = "password"
[config]
verbose = true
conversational = true
dev = false
[languages]
python = ["C:\\Windows\\py.EXE", "-c"]
rust = ["cargo", "script", "-e"]
[tts]
enabled = true
engine = "SystemEngine"
YAML
What your profile.yaml
might look like:
user:
name: "Amor"
version: "1.0.0"
assistant:
name: "Basil"
personality: "You have a kind, deterministic and professional attitude towards your work and respond in a formal, yet casual manner."
messages: []
breakers:
- "the task is done."
- "the conversation is done."
safeguards:
timeout: 16
auto_run: true
auto_install: true
extensions:
Browser:
headless: false
engine: "google"
Email:
email: "[email protected]"
password: "password"
config:
verbose: true
conversational: true
dev: false
languages:
python: ["C:\\Windows\\py.EXE", "-c"]
rust: ["cargo", "script", "-e"]
tts:
enabled: true
engine: "SystemEngine"
You can also switch between profiles by running:
macro --switch "amor"
Profiles also support versions for modularity (uses the latest version by default).
macro --switch "amor:1.0.0"
[!NOTE] All profiles are isolated. LTM from different profiles and versions are not shared.
You can also quick update a profile. [BETA]
macro --update "amor"
Quick updating allows you to easily update and make changes to your profile. Simply make changes to the original profile file, then call above.
To view all available profiles run:
macro --profiles
To view all available versions of a profile run:
macro --versions <profile_name>
openmacro supports custom RAG extensions for modularity and better capabilities! By default, the browser
and email
extensions are installed.
Write extensions using the template:
from typing import TypedDict
class ExtensionKwargs(TypedDict):
...
class Extensionname:
def __init__(self):
...
@staticmethod
def load_instructions() -> str:
return "<instructions>"
You can find examples here.
[!TIP] classname should not be camelcase, but titlecase instead.
[!NOTE] creating a type-safe kwargs typeddict is optional but recommended.
If extesions does not contain a kwarg class, use:
from openmacro.utils import Kwargs
Upload your code to pypi
for public redistribution using twine
and poetry
.
To add it to openmacro.extensions
for profiles for the AI to use, run:
omi install <module_name>
or
pip install <module_name>
omi add <module_name>
You can test your extensions by installing it locally:
omi install .
- [x] AI Interpreter
- [X] Web Search Capability
- [X] Async Chunk Streaming
- [X] API Keys Support
- [X] Profiles Support
- [X] Extensions API
- [ ]
WIP
TTS & STT - [ ]
WIP
Cost Efficient Long Term Memory & Context Manager - [ ] Semantic File Search
- [ ] Optional Telemetry
- [ ] Desktop, Android & IOS App Interface
- Optimisations
- Cost efficient long term memory and conversational context managers through vector databases. Most likely powered by
ChromaDB
. - Hooks API and Live Code Output Streaming
This is my first major open-source project, so things might go wrong, and there is always room for improvement. You can contribute by raising issues, helping with documentation, adding comments, suggesting features or ideas, etc. Your help is greatly appreciated!
You can support this project by writing custom extensions for openmacro! openmacro aims to be community-powered, as its limitations are based on its capabilities. More extensions mean better chances of completing complex tasks. I will create an official verified list of openmacro extensions sometime in the future!
You can contact me at [email protected].
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for openmacro
Similar Open Source Tools
openmacro
Openmacro is a multimodal personal agent that allows users to run code locally. It acts as a personal agent capable of completing and automating tasks autonomously via self-prompting. The tool provides a CLI natural-language interface for completing and automating tasks, analyzing and plotting data, browsing the web, and manipulating files. Currently, it supports API keys for models powered by SambaNova, with plans to add support for other hosts like OpenAI and Anthropic in future versions.
chat-ui
A chat interface using open source models, eg OpenAssistant or Llama. It is a SvelteKit app and it powers the HuggingChat app on hf.co/chat.
ruby-openai
Use the OpenAI API with Ruby! 🤖🩵 Stream text with GPT-4, transcribe and translate audio with Whisper, or create images with DALL·E... Hire me | 🎮 Ruby AI Builders Discord | 🐦 Twitter | 🧠 Anthropic Gem | 🚂 Midjourney Gem ## Table of Contents * Ruby OpenAI * Table of Contents * Installation * Bundler * Gem install * Usage * Quickstart * With Config * Custom timeout or base URI * Extra Headers per Client * Logging * Errors * Faraday middleware * Azure * Ollama * Counting Tokens * Models * Examples * Chat * Streaming Chat * Vision * JSON Mode * Functions * Edits * Embeddings * Batches * Files * Finetunes * Assistants * Threads and Messages * Runs * Runs involving function tools * Image Generation * DALL·E 2 * DALL·E 3 * Image Edit * Image Variations * Moderations * Whisper * Translate * Transcribe * Speech * Errors * Development * Release * Contributing * License * Code of Conduct
firecrawl
Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown. It crawls all accessible subpages and provides clean markdown for each, without requiring a sitemap. The API is easy to use and can be self-hosted. It also integrates with Langchain and Llama Index. The Python SDK makes it easy to crawl and scrape websites in Python code.
AICentral
AI Central is a powerful tool designed to take control of your AI services with minimal overhead. It is built on Asp.Net Core and dotnet 8, offering fast web-server performance. The tool enables advanced Azure APIm scenarios, PII stripping logging to Cosmos DB, token metrics through Open Telemetry, and intelligent routing features. AI Central supports various endpoint selection strategies, proxying asynchronous requests, custom OAuth2 authorization, circuit breakers, rate limiting, and extensibility through plugins. It provides an extensibility model for easy plugin development and offers enriched telemetry and logging capabilities for monitoring and insights.
mistreevous
Mistreevous is a library written in TypeScript for Node and browsers, used to declaratively define, build, and execute behaviour trees for creating complex AI. It allows defining trees with JSON or a minimal DSL, providing in-browser editor and visualizer. The tool offers methods for tree state, stepping, resetting, and getting node details, along with various composite, decorator, leaf nodes, callbacks, guards, and global functions/subtrees. Version history includes updates for node types, callbacks, global functions, and TypeScript conversion.
sparrow
Sparrow is an innovative open-source solution for efficient data extraction and processing from various documents and images. It seamlessly handles forms, invoices, receipts, and other unstructured data sources. Sparrow stands out with its modular architecture, offering independent services and pipelines all optimized for robust performance. One of the critical functionalities of Sparrow - pluggable architecture. You can easily integrate and run data extraction pipelines using tools and frameworks like LlamaIndex, Haystack, or Unstructured. Sparrow enables local LLM data extraction pipelines through Ollama or Apple MLX. With Sparrow solution you get API, which helps to process and transform your data into structured output, ready to be integrated with custom workflows. Sparrow Agents - with Sparrow you can build independent LLM agents, and use API to invoke them from your system. **List of available agents:** * **llamaindex** - RAG pipeline with LlamaIndex for PDF processing * **vllamaindex** - RAG pipeline with LLamaIndex multimodal for image processing * **vprocessor** - RAG pipeline with OCR and LlamaIndex for image processing * **haystack** - RAG pipeline with Haystack for PDF processing * **fcall** - Function call pipeline * **unstructured-light** - RAG pipeline with Unstructured and LangChain, supports PDF and image processing * **unstructured** - RAG pipeline with Weaviate vector DB query, Unstructured and LangChain, supports PDF and image processing * **instructor** - RAG pipeline with Unstructured and Instructor libraries, supports PDF and image processing. Works great for JSON response generation
pipecat-flows
Pipecat Flows is a framework designed for building structured conversations in AI applications. It allows users to create both predefined conversation paths and dynamically generated flows, handling state management and LLM interactions. The framework includes a Python module for building conversation flows and a visual editor for designing and exporting flow configurations. Pipecat Flows is suitable for scenarios such as customer service scripts, intake forms, personalized experiences, and complex decision trees.
ZerePy
ZerePy is an open-source Python framework for deploying agents on X using OpenAI or Anthropic LLMs. It offers CLI interface, Twitter integration, and modular connection system. Users can fine-tune models for creative outputs and create agents with specific tasks. The tool requires Python 3.10+, Poetry 1.5+, and API keys for LLM, OpenAI, Anthropic, and X API.
promptic
Promptic is a tool designed for LLM app development, providing a productive and pythonic way to build LLM applications. It leverages LiteLLM, allowing flexibility to switch LLM providers easily. Promptic focuses on building features by providing type-safe structured outputs, easy-to-build agents, streaming support, automatic prompt caching, and built-in conversation memory.
008
008 is an open-source event-driven AI powered WebRTC Softphone compatible with macOS, Windows, and Linux. It is also accessible on the web. The name '008' or 'agent 008' reflects our ambition: beyond crafting the premier Open Source Softphone, we aim to introduce a programmable, event-driven AI agent. This agent utilizes embedded artificial intelligence models operating directly on the softphone, ensuring efficiency and reduced operational costs.
llm-rag-workshop
The LLM RAG Workshop repository provides a workshop on using Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) to generate and understand text in a human-like manner. It includes instructions on setting up the environment, indexing Zoomcamp FAQ documents, creating a Q&A system, and using OpenAI for generation based on retrieved information. The repository focuses on enhancing language model responses with retrieved information from external sources, such as document databases or search engines, to improve factual accuracy and relevance of generated text.
VectorETL
VectorETL is a lightweight ETL framework designed to assist Data & AI engineers in processing data for AI applications quickly. It streamlines the conversion of diverse data sources into vector embeddings and storage in various vector databases. The framework supports multiple data sources, embedding models, and vector database targets, simplifying the creation and management of vector search systems for semantic search, recommendation systems, and other vector-based operations.
fluid-db
FluidDB is a research repository focusing on the concept of a fluid database that dynamically updates its schema based on ingested data. It enables the creation of personalized AI agents with features like adaptive schema, flexible querying, and versatile data input. The tool allows for storing unstructured data in a structured form and supports natural language queries. It aims to revolutionize database management by providing a dynamic and intuitive approach to data storage and retrieval.
CredSweeper
CredSweeper is a tool designed to detect credentials like tokens, passwords, and API keys in directories or files. It helps users identify potential exposure of sensitive information by scanning lines, filtering, and utilizing an AI model. The tool reports lines containing possible credentials, their location, and the expected type of credential.
vim-ai
vim-ai is a plugin that adds Artificial Intelligence (AI) capabilities to Vim and Neovim. It allows users to generate code, edit text, and have interactive conversations with GPT models powered by OpenAI's API. The plugin uses OpenAI's API to generate responses, requiring users to set up an account and obtain an API key. It supports various commands for text generation, editing, and chat interactions, providing a seamless integration of AI features into the Vim text editor environment.
For similar tasks
Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.
sorrentum
Sorrentum is an open-source project that aims to combine open-source development, startups, and brilliant students to build machine learning, AI, and Web3 / DeFi protocols geared towards finance and economics. The project provides opportunities for internships, research assistantships, and development grants, as well as the chance to work on cutting-edge problems, learn about startups, write academic papers, and get internships and full-time positions at companies working on Sorrentum applications.
tidb
TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.
zep-python
Zep is an open-source platform for building and deploying large language model (LLM) applications. It provides a suite of tools and services that make it easy to integrate LLMs into your applications, including chat history memory, embedding, vector search, and data enrichment. Zep is designed to be scalable, reliable, and easy to use, making it a great choice for developers who want to build LLM-powered applications quickly and easily.
telemetry-airflow
This repository codifies the Airflow cluster that is deployed at workflow.telemetry.mozilla.org (behind SSO) and commonly referred to as "WTMO" or simply "Airflow". Some links relevant to users and developers of WTMO: * The `dags` directory in this repository contains some custom DAG definitions * Many of the DAGs registered with WTMO don't live in this repository, but are instead generated from ETL task definitions in bigquery-etl * The Data SRE team maintains a WTMO Developer Guide (behind SSO)
mojo
Mojo is a new programming language that bridges the gap between research and production by combining Python syntax and ecosystem with systems programming and metaprogramming features. Mojo is still young, but it is designed to become a superset of Python over time.
pandas-ai
PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.
databend
Databend is an open-source cloud data warehouse that serves as a cost-effective alternative to Snowflake. With its focus on fast query execution and data ingestion, it's designed for complex analysis of the world's largest datasets.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.