openmacro

Multimodal Assistant. Human Interface for computers.

Stars: 62

Visit

Openmacro is a multimodal personal agent that allows users to run code locally. It acts as a personal agent capable of completing and automating tasks autonomously via self-prompting. The tool provides a CLI natural-language interface for completing and automating tasks, analyzing and plotting data, browsing the web, and manipulating files. Currently, it supports API keys for models powered by SambaNova, with plans to add support for other hosts like OpenAI and Anthropic in future versions.

README:

https://github.com/user-attachments/assets/9360dfeb-a471-49c3-bbdc-72b32cc8eaeb

[!WARNING] DISCLAIMER: Project is in its early stage of development. Current version is not stable.

openmacro is a multimodal personal agent that allows LLMs to run code locally. openmacro aims to act as a personal agent capable of completing and automating simple to complex tasks autonomously via self prompting.

This provides a cli natural-language interface for you to:

Complete and automate simple to complex tasks.
Analyse and plot data.
Browse the web for the latest information.
Manipulate files including photos, videos, PDFs, etc.

At the moment, openmacro only supports API keys for models powered by SambaNova. Why? Because it’s free, fast, and reliable, which makes it ideal for testing as the project grows! Support for other hosts such as OpenAI and Anthropic is planned to be added in future versions.

This project is heavily inspired by Open Interpreter ❤️

Quick Start

To get started with openmacro, get a free API key by creating an account at https://cloud.sambanova.ai/.

Next, install and start openmacro by running:

pip install openmacro
macro --api_key "YOUR_API_KEY"

[!TIP] Not working? Raise an issue here or try this out instead:

py -m pip install openmacro
py -m openmacro --api_key "YOUR_API_KEY"

[!NOTE] You only need to pass --api_key once! Next time simply call macro or py -m openmacro.

[!TIP] You can also assign different api-keys to different profiles!

py -m openmacro --api_key "YOUR_API_KEY" --profile "path\to\profile"

Profiles

openmacro supports cli args and customised settings! You can view arg options by running:

macro --help

To add your own personalised settings and save it for the future, run:

macro --profile "path\to\profile"

Openmacro supports custom profiles in JSON, TOML, YAML and Python:

Python

Profiles in `python` allow direct customisation and type safety!

What your profile.py might look like:

# imports
from openmacro.profile import Profile
from openmacro.extensions import BrowserKwargs, EmailKwargs

# profile setup
profile: Profile = Profile(
    user = { 
        "name": "Amor", 
        "version": "1.0.0"
    },
    assistant = {
        "name": "Macro",
        "personality": "You respond in a professional attitude and respond in a formal, yet casual manner.",
        "messages": [],
        "breakers": ["the task is done.", 
                     "the conversation is done."]
    },
    safeguards = { 
        "timeout": 16, 
        "auto_run": True, 
        "auto_install": True 
    },
    extensions = {
    # type safe kwargs
        "Browser": BrowserKwargs(headless=False, engine="google"),
        "Email": EmailKwargs(email="[email protected]", password="password")
    },
    config = {
        "verbose": True,
        "conversational": True,
        "dev": False
    },
    languages = {
    # specify custom paths to languages or add custom languages for openmacro
      "python": ["C:\Windows\py.EXE", "-c"],
      "rust": ["cargo", "script", "-e"] # not supported by default, but can be added!
    },
    tts = {
    # powered by KoljaB/RealtimeSTT
    # options ["SystemEngine", "GTTSEngine", "OpenAIEngine"]
      "enabled": True,
      "engine": "OpenAIEngine",
      "api_key": "sk-example"
    }
)

And can be extended if you want to build your own app with openmacro:

...

async def main():
    from openmacro.core import Openmacro

    macro = Openmacro(profile)
    macro.llm.messages = []

    async for chunk in macro.chat("Plot an exponential graph for me!", stream=True):
        print(chunk, end="")

import asyncio
asyncio.run(main)

JSON

What your profile.json might look like:

{
    "user": {
        "name": "Amor",
        "version": "1.0.0"
    },
    "assistant": {
        "name": "Basil",
        "personality": "You have a kind, deterministic and professional attitude towards your work and respond in a formal, yet casual manner.",
        "messages": [],
        "breakers": ["the task is done.", "the conversation is done."]
    },
    "safeguards": {
        "timeout": 16,
        "auto_run": true,
        "auto_install": true
    },
    "extensions": {
        "Browser": {
            "headless": false,
            "engine": "google"
        },
        "Email": {
            "email": "[email protected]",
            "password": "password"
        }
    },
    "config": {
        "verbose": true,
        "conversational": true,
        "dev": false
    },
    "languages": {
        "python": ["C:\\Windows\\py.EXE", "-c"],
        "rust": ["cargo", "script", "-e"]
    },
    "tts": {
        "enabled": true,
        "engine": "OpenAIEngine",
        "api_key": "sk-example"
        
    }
}

TOML

What your profile.toml might look like:

[user]
name = "Amor"
version = "1.0.0"

[assistant]
name = "Basil"
personality = "You have a kind, deterministic and professional attitude towards your work and respond in a formal, yet casual manner."
messages = []
breakers = ["the task is done.", "the conversation is done."]

[safeguards]
timeout = 16
auto_run = true
auto_install = true

[extensions.Browser]
headless = false
engine = "google"

[extensions.Email]
email = "[email protected]"
password = "password"

[config]
verbose = true
conversational = true
dev = false

[languages]
python = ["C:\\Windows\\py.EXE", "-c"]
rust = ["cargo", "script", "-e"]

[tts]
enabled = true
engine = "SystemEngine"

YAML

What your profile.yaml might look like:

user:
  name: "Amor"
  version: "1.0.0"

assistant:
  name: "Basil"
  personality: "You have a kind, deterministic and professional attitude towards your work and respond in a formal, yet casual manner."
  messages: []
  breakers:
    - "the task is done."
    - "the conversation is done."

safeguards:
  timeout: 16
  auto_run: true
  auto_install: true

extensions:
  Browser:
    headless: false
    engine: "google"
  Email:
    email: "[email protected]"
    password: "password"

config:
  verbose: true
  conversational: true
  dev: false

languages:
  python: ["C:\\Windows\\py.EXE", "-c"]
  rust: ["cargo", "script", "-e"]

tts:
  enabled: true
  engine: "SystemEngine"

You can also switch between profiles by running:

macro --switch "amor"

Profiles also support versions for modularity (uses the latest version by default).

macro --switch "amor:1.0.0"

[!NOTE] All profiles are isolated. LTM from different profiles and versions are not shared.

You can also quick update a profile. [BETA]

macro --update "amor"

Quick updating allows you to easily update and make changes to your profile. Simply make changes to the original profile file, then call above.

To view all available profiles run:

macro --profiles

To view all available versions of a profile run:

macro --versions <profile_name>

Extensions

openmacro supports custom RAG extensions for modularity and better capabilities! By default, the browser and email extensions are installed.

Writing Extensions

Write extensions using the template:

from typing import TypedDict
class ExtensionKwargs(TypedDict):
    ...

class Extensionname:
    def __init__(self):
        ...
      
    @staticmethod
    def load_instructions() -> str:
        return "<instructions>"

You can find examples here.

[!TIP] classname should not be camelcase, but titlecase instead.

[!NOTE] creating a type-safe kwargs typeddict is optional but recommended.

If extesions does not contain a kwarg class, use:

from openmacro.utils import Kwargs

Upload your code to pypi for public redistribution using twine and poetry. To add it to openmacro.extensions for profiles for the AI to use, run:

omi install <module_name>

pip install <module_name>
omi add <module_name>

You can test your extensions by installing it locally:

omi install .

Todo's

[x] AI Interpreter
[X] Web Search Capability
[X] Async Chunk Streaming
[X] API Keys Support
[X] Profiles Support
[X] Extensions API
[ ] WIP TTS & STT
[ ] WIP Cost Efficient Long Term Memory & Context Manager
[ ] Semantic File Search
[ ] Optional Telemetry
[ ] Desktop, Android & IOS App Interface

Currently Working On

Optimisations
Cost efficient long term memory and conversational context managers through vector databases. Most likely powered by ChromaDB.
Hooks API and Live Code Output Streaming

Contributions

This is my first major open-source project, so things might go wrong, and there is always room for improvement. You can contribute by raising issues, helping with documentation, adding comments, suggesting features or ideas, etc. Your help is greatly appreciated!

Support

You can support this project by writing custom extensions for openmacro! openmacro aims to be community-powered, as its limitations are based on its capabilities. More extensions mean better chances of completing complex tasks. I will create an official verified list of openmacro extensions sometime in the future!

Contact

You can contact me at [email protected].

For Tasks:

Click tags to check more tools for each tasks

analyze data plot graphs browse the web manipulate files automate tasks

For Jobs:

data analyst software developer research scientist automation engineer ai engineer

Alternative AI tools for openmacro

Similar Open Source Tools

openmacro

github

: 62

chat-ui

A chat interface using open source models, eg OpenAssistant or Llama. It is a SvelteKit app and it powers the HuggingChat app on hf.co/chat.

github

: 8.5k

firecrawl

Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown. It crawls all accessible subpages and provides clean markdown for each, without requiring a sitemap. The API is easy to use and can be self-hosted. It also integrates with Langchain and Llama Index. The Python SDK makes it easy to crawl and scrape websites in Python code.

github

: 34.1k

ruby-openai

Use the OpenAI API with Ruby! 🤖🩵 Stream text with GPT-4, transcribe and translate audio with Whisper, or create images with DALL·E... Hire me | 🎮 Ruby AI Builders Discord | 🐦 Twitter | 🧠 Anthropic Gem | 🚂 Midjourney Gem ## Table of Contents * Ruby OpenAI * Table of Contents * Installation * Bundler * Gem install * Usage * Quickstart * With Config * Custom timeout or base URI * Extra Headers per Client * Logging * Errors * Faraday middleware * Azure * Ollama * Counting Tokens * Models * Examples * Chat * Streaming Chat * Vision * JSON Mode * Functions * Edits * Embeddings * Batches * Files * Finetunes * Assistants * Threads and Messages * Runs * Runs involving function tools * Image Generation * DALL·E 2 * DALL·E 3 * Image Edit * Image Variations * Moderations * Whisper * Translate * Transcribe * Speech * Errors * Development * Release * Contributing * License * Code of Conduct

github

: 3.0k

AICentral

AI Central is a powerful tool designed to take control of your AI services with minimal overhead. It is built on Asp.Net Core and dotnet 8, offering fast web-server performance. The tool enables advanced Azure APIm scenarios, PII stripping logging to Cosmos DB, token metrics through Open Telemetry, and intelligent routing features. AI Central supports various endpoint selection strategies, proxying asynchronous requests, custom OAuth2 authorization, circuit breakers, rate limiting, and extensibility through plugins. It provides an extensibility model for easy plugin development and offers enriched telemetry and logging capabilities for monitoring and insights.

github

: 76

mistreevous

Mistreevous is a library written in TypeScript for Node and browsers, used to declaratively define, build, and execute behaviour trees for creating complex AI. It allows defining trees with JSON or a minimal DSL, providing in-browser editor and visualizer. The tool offers methods for tree state, stepping, resetting, and getting node details, along with various composite, decorator, leaf nodes, callbacks, guards, and global functions/subtrees. Version history includes updates for node types, callbacks, global functions, and TypeScript conversion.

github

: 82

pipecat-flows

Pipecat Flows is a framework designed for building structured conversations in AI applications. It allows users to create both predefined conversation paths and dynamically generated flows, handling state management and LLM interactions. The framework includes a Python module for building conversation flows and a visual editor for designing and exporting flow configurations. Pipecat Flows is suitable for scenarios such as customer service scripts, intake forms, personalized experiences, and complex decision trees.

github

: 222

sparrow

Sparrow is an innovative open-source solution for efficient data extraction and processing from various documents and images. It seamlessly handles forms, invoices, receipts, and other unstructured data sources. Sparrow stands out with its modular architecture, offering independent services and pipelines all optimized for robust performance. One of the critical functionalities of Sparrow - pluggable architecture. You can easily integrate and run data extraction pipelines using tools and frameworks like LlamaIndex, Haystack, or Unstructured. Sparrow enables local LLM data extraction pipelines through Ollama or Apple MLX. With Sparrow solution you get API, which helps to process and transform your data into structured output, ready to be integrated with custom workflows. Sparrow Agents - with Sparrow you can build independent LLM agents, and use API to invoke them from your system. **List of available agents:** * **llamaindex** - RAG pipeline with LlamaIndex for PDF processing * **vllamaindex** - RAG pipeline with LLamaIndex multimodal for image processing * **vprocessor** - RAG pipeline with OCR and LlamaIndex for image processing * **haystack** - RAG pipeline with Haystack for PDF processing * **fcall** - Function call pipeline * **unstructured-light** - RAG pipeline with Unstructured and LangChain, supports PDF and image processing * **unstructured** - RAG pipeline with Weaviate vector DB query, Unstructured and LangChain, supports PDF and image processing * **instructor** - RAG pipeline with Unstructured and Instructor libraries, supports PDF and image processing. Works great for JSON response generation

github

: 4.5k

aiavatarkit

AIAvatarKit is a tool for building AI-based conversational avatars quickly. It supports various platforms like VRChat and cluster, along with real-world devices. The tool is extensible, allowing unlimited capabilities based on user needs. It requires VOICEVOX API, Google or Azure Speech Services API keys, and Python 3.10. Users can start conversations out of the box and enjoy seamless interactions with the avatars.

github

: 303

promptic

Promptic is a tool designed for LLM app development, providing a productive and pythonic way to build LLM applications. It leverages LiteLLM, allowing flexibility to switch LLM providers easily. Promptic focuses on building features by providing type-safe structured outputs, easy-to-build agents, streaming support, automatic prompt caching, and built-in conversation memory.

github

: 223

008

008 is an open-source event-driven AI powered WebRTC Softphone compatible with macOS, Windows, and Linux. It is also accessible on the web. The name '008' or 'agent 008' reflects our ambition: beyond crafting the premier Open Source Softphone, we aim to introduce a programmable, event-driven AI agent. This agent utilizes embedded artificial intelligence models operating directly on the softphone, ensuring efficiency and reduced operational costs.

github

: 75

llm-rag-workshop

The LLM RAG Workshop repository provides a workshop on using Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) to generate and understand text in a human-like manner. It includes instructions on setting up the environment, indexing Zoomcamp FAQ documents, creating a Q&A system, and using OpenAI for generation based on retrieved information. The repository focuses on enhancing language model responses with retrieved information from external sources, such as document databases or search engines, to improve factual accuracy and relevance of generated text.

github

: 166

VectorETL

VectorETL is a lightweight ETL framework designed to assist Data & AI engineers in processing data for AI applications quickly. It streamlines the conversion of diverse data sources into vector embeddings and storage in various vector databases. The framework supports multiple data sources, embedding models, and vector database targets, simplifying the creation and management of vector search systems for semantic search, recommendation systems, and other vector-based operations.

github

: 72

notebook-intelligence

Notebook Intelligence (NBI) is an AI coding assistant and extensible AI framework for JupyterLab. It greatly boosts the productivity of JupyterLab users with AI assistance by providing features such as code generation with inline chat, auto-complete, and chat interface. NBI supports various LLM Providers and AI Models, including local models from Ollama. Users can configure model provider and model options, remember GitHub Copilot login, and save configuration files. NBI seamlessly integrates with Model Context Protocol (MCP) servers, supporting both Standard Input/Output (stdio) and Server-Sent Events (SSE) transports. Users can easily add MCP servers to NBI, auto-approve tools, set environment variables, and group servers based on functionality. Additionally, NBI allows access to built-in tools from an MCP participant, enhancing the user experience and productivity.

github

: 101

vim-ai

vim-ai is a plugin that adds Artificial Intelligence (AI) capabilities to Vim and Neovim. It allows users to generate code, edit text, and have interactive conversations with GPT models powered by OpenAI's API. The plugin uses OpenAI's API to generate responses, requiring users to set up an account and obtain an API key. It supports various commands for text generation, editing, and chat interactions, providing a seamless integration of AI features into the Vim text editor environment.

github

: 878

fluid-db

FluidDB is a research repository focusing on the concept of a fluid database that dynamically updates its schema based on ingested data. It enables the creation of personalized AI agents with features like adaptive schema, flexible querying, and versatile data input. The tool allows for storing unstructured data in a structured form and supports natural language queries. It aims to revolutionize database management by providing a dynamic and intuitive approach to data storage and retrieval.

github

: 113

For similar tasks

Azure-Analytics-and-AI-Engagement

The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

github

: 136

sorrentum

Sorrentum is an open-source project that aims to combine open-source development, startups, and brilliant students to build machine learning, AI, and Web3 / DeFi protocols geared towards finance and economics. The project provides opportunities for internships, research assistantships, and development grants, as well as the chance to work on cutting-edge problems, learn about startups, write academic papers, and get internships and full-time positions at companies working on Sorrentum applications.

github

: 89

tidb

TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

github

: 37.1k

zep-python

Zep is an open-source platform for building and deploying large language model (LLM) applications. It provides a suite of tools and services that make it easy to integrate LLMs into your applications, including chat history memory, embedding, vector search, and data enrichment. Zep is designed to be scalable, reliable, and easy to use, making it a great choice for developers who want to build LLM-powered applications quickly and easily.

github

: 60

telemetry-airflow

This repository codifies the Airflow cluster that is deployed at workflow.telemetry.mozilla.org (behind SSO) and commonly referred to as "WTMO" or simply "Airflow". Some links relevant to users and developers of WTMO: * The `dags` directory in this repository contains some custom DAG definitions * Many of the DAGs registered with WTMO don't live in this repository, but are instead generated from ETL task definitions in bigquery-etl * The Data SRE team maintains a WTMO Developer Guide (behind SSO)

github

: 185

mojo

Mojo is a new programming language that bridges the gap between research and production by combining Python syntax and ecosystem with systems programming and metaprogramming features. Mojo is still young, but it is designed to become a superset of Python over time.

github

: 23.0k

pandas-ai

PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.

github

: 14.0k

databend

Databend is an open-source cloud data warehouse that serves as a cost-effective alternative to Snowflake. With its focus on fast query execution and data ingestion, it's designed for complex analysis of the world's largest datasets.

github

: 7.7k

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k