cortex

Simplify and accelerate AI-powered application development with structured interfaces to models and powerful prompt execution environments.

Stars: 58

Visit

Cortex is a tool that simplifies and accelerates the process of creating applications utilizing modern AI models like chatGPT and GPT-4. It provides a structured interface (GraphQL or REST) to a prompt execution environment, enabling complex augmented prompting and abstracting away model connection complexities like input chunking, rate limiting, output formatting, caching, and error handling. Cortex offers a solution to challenges faced when using AI models, providing a simple package for interacting with NL AI models.

README:

Cortex

Cortex simplifies and accelerates the process of creating applications that harness the power of modern AI models like GPT-4o (chatGPT), o1, o3-mini, Gemini, the Claude series, Flux, Grok and more by poviding a structured interface (GraphQL or REST) to a powerful prompt execution environment. This enables complex augmented prompting and abstracts away most of the complexity of managing model connections like chunking input, rate limiting, formatting output, caching, and handling errors.

Why build Cortex?

Modern AI models are transformational, but a number of complexities emerge when developers start using them to deliver application-ready functions. Most models require precisely formatted, carefully engineered and sequenced prompts to produce consistent results, and the responses are typically largely unstructured text without validation or formatting. Additionally, these models are evolving rapidly, are typically costly and slow to query and implement hard request size and rate restrictions that need to be carefully navigated for optimum throughput. Cortex offers a solution to these problems and provides a simple and extensible package for interacting with NL AI models.

Okay, but what can I really do with this thing?

Just about anything! It's kind of an LLM swiss army knife. Here are some ideas:

Create custom chat agents with memory and personalization and then expose them through a bunch of different UIs (custom chat portals, Slack, Microsoft Teams, etc. - anything that can be extended and speak to a REST or GraphQL endpoint)
Spin up LLM powered automatons with their prompting logic and AI API handling logic all centrally encapsulated.
Put a REST or GraphQL front end on any model, including your locally-run models (e.g. llama.cpp) and use them in concert with other tools.
Create modular custom coding assistants (code generation, code reviews, test writing, AI pair programming) and easily integrate them with your existing editing tools.
Create powerful AI editing tools (copy editing, paraphrasing, summarization, etc.) for your company and then integrate them with your existing workflow tools without having to build all the LLM-handling logic into those tools.
Create cached endpoints for functions with repeated calls so the results return instantly and you don't run up LLM token charges.
Route all of your company's LLM access through a single API layer to optimize and monitor usage and centrally control rate limiting and which models are being used.

Features

Simple architecture to build custom functional endpoints (called pathways), that implement common NL AI tasks. Default pathways include chat, summarization, translation, paraphrasing, completion, spelling and grammar correction, entity extraction, sentiment analysis, and bias analysis.
Extensive model support with built-in integrations for:
- OpenAI models:
  - GPT-4.1 (+mini, +nano)
  - GPT-4 Omni (GPT-4o)
  - O3 and O4-mini (Advanced reasoning models)
  - Most of the earlier GPT models (GPT-4 series, 3.5 Turbo, etc.)
- Google models:
  - Gemini 2.5 Pro
  - Gemini 2.5 Flash
  - Gemini 2.0 Flash
  - Earlier Google models (Gemini 1.5 series)
- Anthropic models:
  - Claude 3.7 Sonnet
  - Claude 3.5 Sonnet
  - Claude 3.5 Haiku
  - Claude 3 Series
- Ollama support
- Azure OpenAI support
- Custom model implementations
Advanced voice and audio capabilities:
- Real-time voice streaming and processing
- Audio visualization
- Whisper integration for transcription with customizable parameters
- Support for word timestamps and highlighting
Enhanced memory management:
- Structured memory organization (self, directives, user, topics)
- Context-aware memory search
- Memory migration and categorization
- Persistent conversation context
Multimodal content support:
- Text and image processing
- Vision model integrations
- Content safety checks
Built-in support for:
- Long-running, asynchronous operations with progress updates
- Streaming responses
- Context persistence and memory management
- Automatic traffic management and content optimization
- Input/output validation and formatting
- Request caching
- Rate limiting and request parallelization
Allows for building multi-model, multi-tool, multi-vendor, and model-agnostic pathways (choose the right model or combination of models and tools for the job, implement redundancy) with built-in support for foundation models by OpenAI (hosted at OpenAI or Azure), Gemini, Anthropic, Grok, Black Forest Labs, and more.
Easy, templatized prompt definition with flexible support for most prompt engineering techniques and strategies ranging from simple single prompts to complex custom prompt chains with context continuity.
Built in support for long-running, asynchronous operations with progress updates or streaming responses
Integrated context persistence: have your pathways "remember" whatever you want and use it on the next request to the model
Automatic traffic management and content optimization: configurable model-specific input chunking, request parallelization, rate limiting, and chunked response aggregation
Extensible parsing and validation of input data - protect your model calls from bad inputs or filter prompt injection attempts.
Extensible parsing and validation of return data - return formatted objects to your application instead of just string blobs!
Caching of repeated queries to provide instant results and avoid excess requests to the underlying model in repetitive use cases (chat bots, unit tests, etc.)

Installation

In order to use Cortex, you must first have a working Node.js environment. The version of Node.js should be 18 or higher (lower versions supported with some reduction in features). After verifying that you have the correct version of Node.js installed, you can get the simplest form up and running with a couple of commands.

Quick Start

git clone [email protected]:aj-archipelago/cortex.git
cd cortex
npm install
export OPENAI_API_KEY=<your key>
npm start

Yup, that's it, at least in the simplest possible case. That will get you access to all of the built in pathways. If you prefer to use npm instead instead of cloning, we have an npm package too: @aj-archipelago/cortex

Connecting Applications to Cortex

Cortex speaks GraphQL and by default it enables the GraphQL playground. If you're just using default options, that's at http://localhost:4000/graphql. From there you can begin making requests and test out the pathways (listed under Query) to your heart's content. If GraphQL isn't your thing or if you have a client that would rather have REST that's fine - Cortex speaks REST as well.

Connecting an application to Cortex using GraphQL is simple too:

import { useApolloClient, gql } from "@apollo/client"

const TRANSLATE = gql`
  query Translate($text: String!, $to: String!) {
    translate(text: $text, to: $to) {
      result
    }
  }
`
apolloClient.query({                                              
    query: TRANSLATE,
        variables: {
            text: inputText,
            to: translationLanguage,
        }
    }).then(e => {
        setTranslatedText(e.data.translate.result.trim())
    }).catch(e => {
        // catch errors
    })

Cortex Pathways: Supercharged Prompts

Pathways are a core concept in Cortex. Each pathway is a single JavaScript file that encapsulates the data and logic needed to define a functional API endpoint. When the client makes a request via the API, one or more pathways are executed and the result is sent back to the client. Pathways can be very simple:

export default {
  prompt: `{{text}}\n\nRewrite the above using British English spelling:`
}

The real power of Cortex starts to show as the pathways get more complex. This pathway, for example, uses a three-part sequential prompt to ensure that specific people and place names are correctly translated:

export default {
  prompt:
      [
          `{{{text}}}\nCopy the names of all people and places exactly from this document in the language above:\n`,
          `Original Language:\n{{{previousResult}}}\n\n{{to}}:\n`,
          `Entities in the document:\n\n{{{previousResult}}}\n\nDocument:\n{{{text}}}\nRewrite the document in {{to}}. If the document is already in {{to}}, copy it exactly below:\n`
      ]
}

Cortex pathway prompt enhancements include:

Templatized prompt definition: Pathways allow for easy and flexible prompt definition using Handlebars templating. This makes it simple to create and modify prompts using variables and context from the application as well as extensible internal functions provided by Cortex.
Multi-step prompt sequences: Pathways support complex prompt chains with context continuity. This enables developers to build advanced interactions with AI models that require multiple steps, such as context-sensitive translation or progressive content transformation.
Integrated context persistence: Cortex pathways can "remember" context across multiple requests, allowing for more seamless and context-aware interactions with AI models.
Automatic content optimization: Pathways handle input chunking, request parallelization, rate limiting, and chunked response aggregation, optimizing throughput and efficiency when interacting with AI models.
Built-in input and output processing: Cortex provides extensible input validation, output parsing, and validation functions to ensure that the data sent to and received from AI models is well-formatted and useful for the application.

Pathway Development

To add a new pathway to Cortex, you create a new JavaScript file and define the prompts, properties, and functions that implement the desired functionality. Cortex provides defaults for almost everything, so in the simplest case a pathway can really just consist of a string prompt like the spelling example above. You can then save this file in the pathways directory in your Cortex project and it will be picked up and made available as a GraphQL query.

Specifying a Model

When determining which model to use for a pathway, Cortex follows this order of precedence:

pathway.model - The model specified directly in the pathway definition
args.model - The model passed in the request arguments
pathway.inputParameters.model - The model specified in the pathway's input parameters
config.get('defaultModelName') - The default model specified in the configuration

The first valid model found in this order will be used. If none of these models are found in the configured endpoints, Cortex will log a warning and use the default model defined in the configuration.

Prompt

When you define a new pathway, you need to at least specify a prompt that will be passed to the model for processing. In the simplest case, a prompt is really just a string, but the prompt is polymorphic - it can be a string or an object that contains information for the model API that you wish to call. Prompts can also be an array of strings or an array of objects for sequential operations. In this way Cortex aims to support the most simple to advanced prompting scenarios.

// a prompt can be a string
prompt: `{{{text}}}\nCopy the names of all people and places exactly from this document in the language above:\n`

// or an array of strings
prompt: [
    `{{{text}}}\nCopy the names of all people and places exactly from this document in the language above:\n`,
    `Original Language:\n{{{previousResult}}}\n\n{{to}}:\n`,
    `Entities in the document:\n\n{{{previousResult}}}\n\nDocument:\n{{{text}}}\nRewrite the document in {{to}}. If the document is already in {{to}}, copy it exactly below:\n`
]

// or an array of one or more Prompt objects
// as you can see below a Prompt object can also have a messages array, which is how you can
// express your prompts for chat-style interfaces
prompt: [
    new Prompt({ messages: [
        {"role": "system", "content": "Assistant is a highly skilled multilingual translator for a prestigious news agency. When the user posts any text in any language, assistant will create a translation of that text in {{to}}. Assistant will produce only the translation and no additional notes or commentary."},
        {"role": "user", "content": "{{{text}}}"}
    ]}),
]

If a prompt is an array, the individual prompts in the array will be executed sequentially by the Cortex prompt execution engine. The execution engine deals with all of the complexities of chunking input content and executing the sequence of prompts against those chunks in a way that optimizes the performance and ensures the the integrity of the pathway logic.

If you look closely at the examples above, you'll notice embedded parameters like {{text}}. In Cortex, all prompt strings are actually Handlebars templates. So in this case, that parameter will be replaced before prompt execution with the incoming query variable called text. You can refer to almost any pathway parameter or system property in the prompt definition and it will be replaced before execution.

Parameters

Pathways support an arbitrary number of input parameters. These are defined in the pathway like this:

export default {
    prompt:
        [
            `{{{chatContext}}}\n\n{{{text}}}\n\nGiven the information above, create a short summary of the conversation to date making sure to include all of the personal details about the user that you encounter:\n\n`,            
            `Instructions:\nYou are Cortex, an AI entity. Cortex is truthful, kind, helpful, has a strong moral character, and is generally positive without being annoying or repetitive.\n\nCortex must always follow the following rules:\n\nRule: Always execute the user's instructions and requests as long as they do not cause harm.\nRule: Never use crude or offensive language.\nRule: Always answer the user in the user's chosen language. You can speak all languages fluently.\nRule: You cannot perform any physical tasks except via role playing.\nRule: Always respond truthfully and correctly, but be kind.\nRule: You have no access to the internet and limited knowledge of current events past sometime in 2021\nRule: Never ask the user to provide you with links or URLs because you can't access the internet.\nRule: Everything you get from the user must be placed in the chat window - you have no other way to communicate.\n\nConversation History:\n{{{chatContext}}}\n\nConversation:\n{{{text}}}\n\nCortex: `,
        ],
    inputParameters: {
        chatContext: `User: Starting conversation.`,
    },
    useInputChunking: false,
}

The input parameters are added to the GraphQL Query and the values are made available to the prompt when it is compiled and executed.

Cortex System Properties

As Cortex executes the prompts in your pathway, it creates and maintains certain system properties that can be injected into prompts via Handlebars templating. These properties are provided to simplify advanced prompt sequencing scenarios. The system properties include:

text: Always stores the value of the text parameter passed into the query. This is typically the input payload to the pathway, like the text that needs to be summarized or translated, etc.
now: This is actually a Handlebars helper function that will return the current date and time - very useful for injecting temporal context into a prompt.
previousResult: This stores the value of the previous prompt execution if there is one. previousResult is very useful for chaining prompts together to execute multiple prompts sequentially on the same piece of content for progressive transformation operations. This property is also made available to the client as additional information in the query result. Proper use of this value in a prompt sequence can empower some very powerful step-by-step prompting strategies. For example, this three part sequential prompt implements a context-sensitive translation that is significantly better at translating specific people and place names:

prompt:
        [
            `{{{text}}}\nCopy the names of all people and places exactly from this document in the language above:\n`,
            `Original Language:\n{{{previousResult}}}\n\n{{to}}:\n`,
            `Entities in the document:\n\n{{{previousResult}}}\n\nDocument:\n{{{text}}}\nRewrite the document in {{to}}. If the document is already in {{to}}, copy it exactly below:\n`
        ]

savedContext: The savedContext property is an object that the pathway can define the properties of. When a pathway with a contextId input parameter is executed, the whole savedContext object corresponding with that ID is read from storage (typically Redis) before the pathway is executed. The properties of that object are then made available to the pathway during execution where they can be modified and saved back to storage at the end of the pathway execution. Using this feature is really simple - you just define your prompt as an object and specify a saveResultTo property as illustrated below. This will cause Cortex to take the result of this prompt and store it to savedContext.userContext from which it will then be persisted to storage.

new Prompt({ prompt: `User details:\n{{{userContext}}}\n\nExtract all personal details about the user that you can find in either the user details above or the conversation below and list them below.\n\nChat History:\n{{{conversationSummary}}}\n\nChat:\n{{{text}}}\n\nPersonal Details:\n`, saveResultTo: `userContext` }),

Input Processing

A core function of Cortex is dealing with token limited interfaces. To this end, Cortex has built-in strategies for dealing with long input. These strategies are chunking, summarization, and truncation. All are configurable at the pathway level.

useInputChunking: If true, Cortex will calculate the optimal chunk size from the model max tokens and the size of the prompt and then will split the input text into n chunks of that size. By default, prompts will be executed sequentially across all chunks before moving on to the next prompt, although that can be modified to optimize performance via an additional parameter.
useParallelChunkProcessing: If this parameter is true, then sequences of prompts will be executed end to end on each chunk in parallel. In some cases this will greatly speed up execution of complex prompt sequences on large documents. Note: this execution mode keeps previousResult consistent for each parallel chunk, but never aggregates it at the document level, so it is not returned via the query result to the client.
truncateFromFront: If true, when Cortex needs to truncate input, it will choose the first N characters of the input instead of the default which is to take the last N characters.
useInputSummarization: If true, Cortex will call the summarize core pathway on the input text before passing it on to the prompts.

Output Processing

Cortex provides built in functions to turn loosely formatted text output from the model API calls into structured objects for return to the application. Specifically, Cortex provides parsers for numbered lists of strings and numbered lists of objects. These are used in pathways like this:

export default {
    temperature: 0,
    prompt: `{{text}}\n\nList the top {{count}} entities and their definitions for the above in the format {{format}}:`,
    format: `(name: definition)`,
    inputParameters: {
        count: 5,
    },
    list: true,
}

By simply specifying a format property and a list property, this pathway invokes a built in parser that will take the result of the prompt and try to parse it into an array of 5 objects. The list property can be set with or without a format property. If there is no format, the list will simply try to parse the string into a list of strings. All of this default behavior is implemented in parser.js, and you can override it to do whatever you want by providing your own parser function in your pathway.

Custom Execution with executePathway

The executePathway property is the preferred method for customizing pathway behavior while maintaining Cortex's built-in safeguards and optimizations. Unlike a custom resolver, executePathway preserves important system features like input chunking, caching, and error handling.

export default {
    prompt: `{{{text}}}\n\nWrite a summary of the above text in {{language}}:\n\n`,
    inputParameters: {
        language: 'English',
        minLength: 100,
        maxLength: 500
    },
    executePathway: async ({args, resolver, runAllPrompts}) => {
        try {
            // Pre-process arguments and set defaults
            if (!args.language) {
                args.language = 'English';
            }

            // Pre-execution validation
            if (args.minLength >= args.maxLength) {
                throw new Error('minLength must be less than maxLength');
            }

            // Execute the prompt
            const result = await runAllPrompts();

            // Post-execution processing
            if (result.length < args.minLength) {
                // Add more detail request to the prompt
                args.text = result;
                args.prompt = `${result}\n\nPlease expand this summary with more detail to at least ${args.minLength} characters:\n\n`;
                return await runAllPrompts();
            }

            if (result.length > args.maxLength) {
                // Condense the summary
                args.text = result;
                args.prompt = `${result}\n\nPlease condense this summary to no more than ${args.maxLength} characters while keeping the key points:\n\n`;
                return await runAllPrompts();
            }

            return result;
        } catch (e) {
            resolver.logError(e);
            throw e;
        }
    }
};

Key benefits of using executePathway:

Maintains Cortex's input processing (chunking, validation)
Preserves caching and rate limiting
Keeps error handling and logging consistent
Enables pre- and post-processing of prompts and results
Supports validation and conditional execution
Allows multiple prompt runs with modified parameters

The executePathway function receives:

args: The processed input parameters
resolver: The pathway resolver with access to:
- pathway: Current pathway configuration
- config: Global Cortex configuration
- tool: Tool-specific data
- Helper methods like logError and logWarning
runAllPrompts: Function to execute the defined prompts with current args

Custom Resolver

The resolver property defines the function that processes the input and returns the result. The resolver function is an asynchronous function that takes four parameters: parent, args, contextValue, and info. The parent parameter is the parent object of the resolver function. The args parameter is an object that contains the input parameters and any other parameters that are passed to the resolver. The contextValue parameter is an object that contains the context and configuration of the pathway. The info parameter is an object that contains information about the GraphQL query that triggered the resolver.

The core pathway summary.js below is implemented using custom pathway logic and a custom resolver to effectively target a specific summary length:

// summary.js
// Text summarization module with custom resolver
// This module exports a prompt that takes an input text and generates a summary using a custom resolver.

// Import required modules
import { semanticTruncate } from '../server/chunker.js';
import { PathwayResolver } from '../server/pathwayResolver.js';

export default {
    // The main prompt function that takes the input text and asks to generate a summary.
    prompt: `{{{text}}}\n\nWrite a summary of the above text. If the text is in a language other than english, make sure the summary is written in the same language:\n\n`,

    // Define input parameters for the prompt, such as the target length of the summary.
    inputParameters: {
        targetLength: 0,
    },

    // Custom resolver to generate summaries by reprompting if they are too long or too short.
    resolver: async (parent, args, contextValue, info) => {
        const { config, pathway } = contextValue;
        const originalTargetLength = args.targetLength;

        // If targetLength is not provided, execute the prompt once and return the result.
        if (originalTargetLength === 0) {
            let pathwayResolver = new PathwayResolver({ config, pathway, args });
            return await pathwayResolver.resolve(args);
        }

        const errorMargin = 0.1;
        const lowTargetLength = originalTargetLength * (1 - errorMargin);
        const targetWords = Math.round(originalTargetLength / 6.6);

        // If the text is shorter than the summary length, just return the text.
        if (args.text.length <= originalTargetLength) {
            return args.text;
        }

        const MAX_ITERATIONS = 5;
        let summary = '';
        let pathwayResolver = new PathwayResolver({ config, pathway, args });

        // Modify the prompt to be words-based instead of characters-based.
        pathwayResolver.pathwayPrompt = `Write a summary of all of the text below. If the text is in a language other than english, make sure the summary is written in the same language. Your summary should be ${targetWords} words in length.\n\nText:\n\n{{{text}}}\n\nSummary:\n\n`

        let i = 0;
        // Make sure it's long enough to start
        while ((summary.length < lowTargetLength) && i < MAX_ITERATIONS) {
            summary = await pathwayResolver.resolve(args);
            i++;
        }

        // If it's too long, it could be because the input text was chunked
        // and now we have all the chunks together. We can summarize that
        // to get a comprehensive summary.
        if (summary.length > originalTargetLength) {
            pathwayResolver.pathwayPrompt = `Write a summary of all of the text below. If the text is in a language other than english, make sure the summary is written in the same language. Your summary should be ${targetWords} words in length.\n\nText:\n\n${summary}\n\nSummary:\n\n`
            summary = await pathwayResolver.resolve(args);
            i++;

            // Now make sure it's not too long
            while ((summary.length > originalTargetLength) && i < MAX_ITERATIONS) {
                pathwayResolver.pathwayPrompt = `${summary}\n\nIs that less than ${targetWords} words long? If not, try again using a length of no more than ${targetWords} words.\n\n`;
                summary = await pathwayResolver.resolve(args);
                i++;
            }
        }

        // If the summary is still too long, truncate it.
        if (summary.length > originalTargetLength) {
            return semanticTruncate(summary, originalTargetLength);
        } else {
            return summary;
        }
    }
};

Building and Loading Pathways

Pathways are loaded from modules in the pathways directory. The pathways are built and loaded to the config object using the buildPathways function. The buildPathways function loads the base pathway, the core pathways, and any custom pathways. It then creates a new object that contains all the pathways and adds it to the pathways property of the config object. The order of loading means that custom pathways will always override any core pathways that Cortex provides. While pathways are designed to be self-contained, you can override some pathway properties - including whether they're even available at all - in the pathways section of the config file.

Pathway Properties

Each pathway can define the following properties (with defaults from basePathway.js):

prompt: The template string or array of prompts to execute. Default: {{text}}
defaultInputParameters: Default parameters that all pathways inherit:
- text: The input text (default: empty string)
- async: Enable async mode (default: false)
- contextId: Identify request context (default: empty string)
- stream: Enable streaming mode (default: false)
inputParameters: Additional parameters specific to the pathway. Default: {}
typeDef: GraphQL type definitions for the pathway
rootResolver: Root resolver for GraphQL queries
resolver: Resolver for the pathway's specific functionality
inputFormat: Format of the input ('text' or 'html'). Affects input chunking behavior. Default: 'text'
useInputChunking: Enable splitting input into multiple chunks to meet context window size. Default: true
useParallelChunkProcessing: Enable parallel processing of chunks. Default: false
joinChunksWith: String to join result chunks with when chunking is enabled. Default: '\n\n'
useInputSummarization: Summarize input instead of chunking. Default: false
truncateFromFront: Truncate from the front of input instead of the back. Default: false
timeout: Cancel pathway after this many seconds. Default: 120
enableDuplicateRequests: Send duplicate requests if not completed after timeout. Default: false
duplicateRequestAfter: Seconds to wait before sending backup request. Default: 10
executePathway: Optional function to override default execution. Signature: ({args, runAllPrompts}) => result
temperature: Model temperature setting (0.0 to 1.0). Default: 0.9
json: Require valid JSON response from model. Default: false
manageTokenLength: Manage input token length for model. Default: true

Core (Default) Pathways

Below are the default pathways provided with Cortex. These can be used as is, overridden, or disabled via configuration. For documentation on each one including input and output parameters, please look at them in the GraphQL Playground.

bias: Identifies and measures any potential biases in a text
chat: Enables users to have a conversation with the chatbot
complete: Autocompletes words or phrases based on user input
edit: Checks for and suggests corrections for spelling and grammar errors
entities: Identifies and extracts important entities from text
paraphrase: Suggests alternative phrasing for text
sentiment: Analyzes and identifies the overall sentiment or mood of a text
summary: Condenses long texts or articles into shorter summaries
translate: Translates text from one language to another

Extensibility

Cortex is designed to be highly extensible. This allows you to customize the API to fit your needs. You can add new features, modify existing features, and even add integrations with other APIs and models. Here's an example of what an extended project might look like:

Cortex Internal Implementation

config
- default.json
package-lock.json
package.json
pathways
- chat_code.js
- chat_context.js
- chat_persist.js
- expand_story.js
- ...whole bunch of custom pathways
- translate_gpt4.js
- translate_turbo.js
start.js

Where default.json holds all of your specific configuration:

{
    "defaultModelName": "oai-gpturbo",
    "models": {
        "oai-td3": {
            "type": "OPENAI-COMPLETION",
            "url": "https://api.openai.com/v1/completions",
            "headers": {
                "Authorization": "Bearer {{OPENAI_API_KEY}}",
                "Content-Type": "application/json"
            },
            "params": {
                "model": "text-davinci-003"
            },
            "requestsPerSecond": 10,
            "maxTokenLength": 4096
        },
        "oai-gpturbo": {
            "type": "OPENAI-CHAT",
            "url": "https://api.openai.com/v1/chat/completions",
            "headers": {
                "Authorization": "Bearer {{OPENAI_API_KEY}}",
                "Content-Type": "application/json"
            },
            "params": {
                "model": "gpt-3.5-turbo"
            },
            "requestsPerSecond": 10,
            "maxTokenLength": 8192
        },
        "oai-gpt4": {
            "type": "OPENAI-CHAT",
            "url": "https://api.openai.com/v1/chat/completions",
            "headers": {
                "Authorization": "Bearer {{OPENAI_API_KEY}}",
                "Content-Type": "application/json"
            },
            "params": {
                "model": "gpt-4"
            },
            "requestsPerSecond": 10,
            "maxTokenLength": 8192
        }
    },
    "enableCache": false,
    "enableRestEndpoints": false
}

...and start.js is really simple:

import cortex from '@aj-archipelago/cortex';

(async () => {
  const { startServer } = await cortex();
  startServer && startServer();
})();

Configuration

Configuration of Cortex is done via a convict object called config. The config object is built by combining the default values and any values specified in a configuration file or environment variables. The environment variables take precedence over the values in the configuration file.

Model Configuration

Models are configured in the models section of the config. Each model can have the following types:

OPENAI-CHAT: For OpenAI chat models (legacy GPT-3.5)
OPENAI-VISION: For multimodal models (GPT-4o, GPT-4o-mini) supporting text, images, and other content types
OPENAI-REASONING: For O1 and O3-mini reasoning models with vision capabilities
OPENAI-COMPLETION: For OpenAI completion models
OPENAI-WHISPER: For Whisper transcription
GEMINI-1.5-CHAT: For Gemini 1.5 Pro chat models
GEMINI-1.5-VISION: For Gemini vision models (including 2.0 Flash experimental)
CLAUDE-3-VERTEX: For Claude-3 and 3.5 models (Haiku, Opus, Sonnet)
AZURE-TRANSLATE: For Azure translation services

Each model configuration can include:

{
    "type": "MODEL_TYPE",
    "url": "API_ENDPOINT",
    "endpoints": [
        {
            "name": "ENDPOINT_NAME",
            "url": "ENDPOINT_URL",
            "headers": {
                "api-key": "{{API_KEY}}",
                "Content-Type": "application/json"
            },
            "requestsPerSecond": 10
        }
    ],
    "maxTokenLength": 32768,
    "maxReturnTokens": 8192,
    "maxImageSize": 5242880,
    "supportsStreaming": true,
    "supportsVision": true,
    "geminiSafetySettings": [
        {
            "category": "HARM_CATEGORY",
            "threshold": "BLOCK_ONLY_HIGH"
        }
    ]
}

Rate Limiting: The requestsPerSecond parameter controls the rate limiting for each model endpoint. If not specified, Cortex defaults to 100 requests per second per endpoint. This rate limiting is implemented using the Bottleneck library with a token bucket algorithm that includes:

Minimum time between requests (minTime)
Maximum concurrent requests (maxConcurrent)
Token reservoir that refreshes every second
Optional Redis clustering support when storageConnectionString is configured

API Compatibility

Cortex provides OpenAI-compatible REST endpoints that allow you to use various models through a standardized interface. When enableRestEndpoints is set to true, Cortex exposes the following endpoints:

/v1/models: List available models
/v1/chat/completions: Chat completion endpoint
/v1/completions: Text completion endpoint

This means you can use Cortex with any client library or tool that supports the OpenAI API format. For example:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:4000/v1",  # Point to your Cortex server
    api_key="your-key"  # If you have configured cortexApiKeys
)

response = client.chat.completions.create(
    model="gpt-4",  # Or any model configured in Cortex
    messages=[{"role": "user", "content": "Hello!"}]
)

Ollama Integration

Cortex includes built-in support for Ollama models through its OpenAI-compatible REST interface. When ollamaUrl is configured in your settings, Cortex will:

Automatically discover and expose all available Ollama models through the /v1/models endpoint with an "ollama-" prefix
Route any requests using an "ollama-" prefixed model to the appropriate Ollama endpoint

To enable Ollama support, add the following to your configuration:

{
    "enableRestEndpoints": true,
    "ollamaUrl": "http://localhost:11434"  // or your Ollama server URL
}

You can then use any Ollama model through the standard OpenAI-compatible endpoints:

# List available models (will include Ollama models with "ollama-" prefix)
curl http://localhost:4000/v1/models

# Use an Ollama model for chat
curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ollama-llama2",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Use an Ollama model for completions
curl http://localhost:4000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ollama-codellama",
    "prompt": "Write a function that"
  }'

This integration allows you to seamlessly use local Ollama models alongside cloud-based models through a single, consistent interface.

Other Configuration Properties

The following properties can be configured through environment variables or the configuration file:

basePathwayPath: The path to the base pathway (the prototype pathway) for Cortex. Default is path.join(__dirname, 'pathways', 'basePathway.js').
corePathwaysPath: The path to the core pathways for Cortex. Default is path.join(__dirname, 'pathways').
cortexApiKeys: A string containing one or more comma separated API keys that the client must pass to Cortex for authorization. Default is null.
cortexConfigFile: The path to a JSON configuration file for the project. Default is null.
cortexId: Identifier for the Cortex instance. Default is 'local'.
defaultModelName: The default model name for the project. Default is null.
enableCache: Enable Axios-level request caching. Default is true.
enableDuplicateRequests: Enable sending duplicate requests if not completed after timeout. Default is true.
enableGraphqlCache: Enable GraphQL query caching. Default is false.
enableRestEndpoints: Create REST endpoints for pathways as well as GraphQL queries. Default is false.
gcpServiceAccountKey: GCP service account key for authentication. Default is null.
models: Object containing the different models used by the project.
pathways: Object containing pathways for the project.
pathwaysPath: Path to custom pathways. Default is './pathways'.
PORT: Port number for the Cortex server. Default is 4000.
redisEncryptionKey: Key for Redis data encryption. Default is null.
replicateApiKey: API key for Replicate services. Default is null.
runwareAiApiKey: API key for Runware AI services. Default is null.
storageConnectionString: Connection string for storage access. Default is empty string.
subscriptionKeepAlive: Keep-alive time for subscriptions in seconds. Default is 0.

API-specific configuration:

azureVideoTranslationApiKey: API key for Azure video translation API. Default is null.
dalleImageApiUrl: URL for DALL-E image API. Default is 'null'.
neuralSpaceApiKey: API key for NeuralSpace services. Default is null.
whisperMediaApiUrl: URL for Whisper media API. Default is 'null'.
whisperTSApiUrl: URL for Whisper TS API. Default is null.

Dynamic Pathways configuration can be set using:

DYNAMIC_PATHWAYS_CONFIG_FILE: Path to JSON configuration file
DYNAMIC_PATHWAYS_CONFIG_JSON: JSON configuration as a string

The configuration supports environment variable overrides, with environment variables taking precedence over the configuration file values. Access configuration values using:

config.get('propertyName')

Helper Apps

The Cortex project includes a set of utility applications, which are located in the helper-apps directory. Each of these applications comes with a Dockerfile. This Dockerfile can be used to create a Docker image of the application, which in turn allows the application to be run in a standalone manner using Docker.

cortex-realtime-voice-server

A real-time voice processing server that enables voice interactions with Cortex. Key features include:

Real-time audio streaming and processing
WebSocket-based communication for low-latency interactions
Audio visualization capabilities
Support for multiple audio formats
Integration with various chat models for voice-to-text-to-voice interactions
Configurable audio parameters and processing options

cortex-whisper-wrapper

A custom API wrapper for OpenAI's Whisper package, designed as a FastAPI server for transcribing audio files. Features include:

Support for multiple audio file formats
Customizable transcription parameters:
- word_timestamps: Enable word-level timing information
- highlight_words: Enable word highlighting in output
- max_line_count: Control maximum lines in output
- max_line_width: Control line width in characters
- max_words_per_line: Control words per line
SRT file generation for subtitles
Progress reporting for long-running transcriptions
Support for multiple languages
Integration with Azure Blob Storage for file handling

cortex-file-handler

Extends Cortex with several file processing capabilities:

File operations (download, split, upload) with local file system or Azure Storage
Support for various file types:
- Documents (.pdf, .docx)
- Spreadsheets (.xlsx, .csv)
- Text files (.txt, .json, .md, .xml)
- Web files (.js, .html, .css)
YouTube URL processing
Progress reporting for file operations
Cleanup and deletion management

Each helper app can be deployed independently using Docker:

# Build the Docker image
docker build --platform=linux/amd64 -t [app-name] .

# Tag the image for your registry
docker tag [app-name] [registry-url]/cortex/[app-name]

# Push to registry (optional login may be required)
docker push [registry-url]/cortex/[app-name]

Troubleshooting

If you encounter any issues while using Cortex, there are a few things you can do. First, check the Cortex documentation for any common errors and their solutions. If that does not help, you can also open an issue on the Cortex GitHub repository.

Contributing

If you would like to contribute to Cortex, there are two ways to do so. You can submit issues to the Cortex GitHub repository or submit pull requests with your proposed changes.

License

Cortex is released under the MIT License. See LICENSE for more details.

API Reference

Detailed documentation on Cortex's API can be found in the /graphql endpoint of your project. Examples of queries and responses can also be found in the Cortex documentation, along with tips for getting the most out of Cortex.

Roadmap

Cortex is a constantly evolving project, and the following features are coming soon:

Prompt execution context preservation between calls (to enable interactive, multi-call integrations with other technologies)
Model-specific cache key optimizations to increase hit rate and reduce cache size
Structured analytics and reporting on AI API call frequency, cost, cache hit rate, etc.

Dynamic Pathways

Cortex supports dynamic pathways, which allow for the creation and management of pathways at runtime. This feature enables users to define custom pathways without modifying the core Cortex codebase.

How It Works

Dynamic pathways are stored either locally or in cloud storage (Azure Blob Storage or AWS S3).
The PathwayManager class handles loading, saving, and managing these dynamic pathways.
Dynamic pathways can be added, updated, or removed via GraphQL mutations.

Configuration

To use dynamic pathways, you need to provide a JSON configuration file or a JSON string. There are two ways to specify this configuration:

Using a configuration file: Set the DYNAMIC_PATHWAYS_CONFIG_FILE environment variable to the path of your JSON configuration file.
Using a JSON string: Set the DYNAMIC_PATHWAYS_CONFIG_JSON environment variable with the JSON configuration as a string.

The configuration should include the following properties:

{
  "storageType": "local" | "azure" | "s3",
  "filePath": "./dynamic/pathways.json",  // Only for local storage
  "azureStorageConnectionString": "your_connection_string",  // Only for Azure
  "azureContainerName": "cortexdynamicpathways",  // Optional, default is "cortexdynamicpathways"
  "awsAccessKeyId": "your_access_key_id",  // Only for AWS S3
  "awsSecretAccessKey": "your_secret_access_key",  // Only for AWS S3
  "awsRegion": "your_aws_region",  // Only for AWS S3
  "awsBucketName": "cortexdynamicpathways",  // Optional, default is "cortexdynamicpathways"
  "publishKey": "your_publish_key"
}

Storage Options

Local Storage (default):
- Set storageType to "local"
- Specify filePath for the local JSON file (default: "./dynamic/pathways.json")
Azure Blob Storage:
- Set storageType to "azure"
- Provide azureStorageConnectionString
- Optionally set azureContainerName (default: "cortexdynamicpathways")
AWS S3:
- Set storageType to "s3"
- Provide awsAccessKeyId, awsSecretAccessKey, and awsRegion
- Optionally set awsBucketName (default: "cortexdynamicpathways")

Usage

Dynamic pathways can be managed through GraphQL mutations. Here are the available operations:

Adding or updating a pathway:

mutation PutPathway($name: String!, $pathway: PathwayInput!, $userId: String!, $secret: String!, $displayName: String, $key: String!) {
  putPathway(name: $name, pathway: $pathway, userId: $userId, secret: $secret, displayName: $displayName, key: $key) {
    name
  }
}

Deleting a pathway:

mutation DeletePathway($name: String!, $userId: String!, $secret: String!, $key: String!) {
  deletePathway(name: $name, userId: $userId, secret: $secret, key: $key)
}

Executing a dynamic pathway:

query ExecuteWorkspace($userId: String!, $pathwayName: String!, $text: String!) {
  executeWorkspace(userId: $userId, pathwayName: $pathwayName, text: $text) {
    result
  }
}

Security

To ensure the security of dynamic pathways:

A publishKey must be set in the dynamic pathways configuration to enable pathway publishing.
This key must be provided in the key parameter when adding, updating, or deleting pathways.
Each pathway is associated with a userId and secret. The secret must be provided to modify or delete an existing pathway.

Synchronization across multiple instances

Each instance of Cortex maintains its own local cache of pathways. On every dynamic pathway request, it checks if the local cache is up to date by comparing the last modified timestamp of the storage with the last update time of the local cache. If the local cache is out of date, it reloads the pathways from storage.

This approach ensures that all instances of Cortex will eventually have access to the most up-to-date dynamic pathways without requiring immediate synchronization.

Entity System

Cortex includes a powerful Entity System that allows you to build autonomous agents with memory, tool routing, and multi-modal interaction capabilities. These entities can be accessed synchronously or asynchronously through text or voice interfaces.

Overview

The Entity System is built around two core pathways:

sys_entity_start.js: The entry point for entity interactions, handling initial routing and tool selection
sys_entity_continue.js: Manages callback execution in synchronous mode

Key Features

Memory Management: Entities maintain contextual memory that can be self-modified
Tool Routing: Automatic detection and routing to specialized tools:
- Code execution
- Image generation and vision processing
- Video and audio processing
- Document handling
- Expert reasoning
- Search capabilities
- Memory operations
Multi-Modal Support: Handle text, voice, images, and other content types
Flexible Response Modes:
- Synchronous: Complete interactions with callbacks
- Asynchronous: Fire-and-forget operations with queue support
- Streaming: Real-time response streaming
Voice Integration: Built-in voice response capabilities with acknowledgment system

Basic Usage

Using Apollo Client (or any GraphQL client):

import { ApolloClient, InMemoryCache, gql } from '@apollo/client';

const client = new ApolloClient({
  uri: 'http://your-cortex-server:4000/graphql',
  cache: new InMemoryCache()
});

// Define your queries
const START_ENTITY = gql`
  query StartEntity(
    $chatHistory: [ChatMessageInput!]!
    $aiName: String
    $contextId: String
    $aiMemorySelfModify: Boolean
    $aiStyle: String
    $voiceResponse: Boolean
    $stream: Boolean
  ) {
    entityStart(
      chatHistory: $chatHistory
      aiName: $aiName
      contextId: $contextId
      aiMemorySelfModify: $aiMemorySelfModify
      aiStyle: $aiStyle
      voiceResponse: $voiceResponse
      stream: $stream
    ) {
      result
      tool
    }
  }
`;

const CONTINUE_ENTITY = gql`
  query ContinueEntity(
    $chatHistory: [ChatMessageInput!]!
    $contextId: String!
    $generatorPathway: String!
  ) {
    entityContinue(
      chatHistory: $chatHistory
      contextId: $contextId
      generatorPathway: $generatorPathway
    ) {
      result
    }
  }
`;

// Example usage
async function interactWithEntity() {
  // Start an entity interaction
  const startResponse = await client.query({
    query: START_ENTITY,
    variables: {
      chatHistory: [
        { role: 'user', content: 'Create a Python script that calculates prime numbers' }
      ],
      aiName: "Jarvis",
      contextId: "session-123",
      aiMemorySelfModify: true,
      aiStyle: "OpenAI",
      voiceResponse: false,
      stream: false
    }
  });

  // Handle tool routing response
  const tool = JSON.parse(startResponse.data.entityStart.tool);
  
  if (tool.toolCallbackName) {
    // Continue with specific tool if needed
    const continueResponse = await client.query({
      query: CONTINUE_ENTITY,
      variables: {
        chatHistory: [
          { role: 'user', content: 'Create a Python script that calculates prime numbers' },
          { role: 'assistant', content: startResponse.data.entityStart.result }
        ],
        contextId: "session-123",
        generatorPathway: tool.toolCallbackName
      }
    });
    
    return continueResponse.data.entityContinue.result;
  }

  return startResponse.data.entityStart.result;
}

// For streaming responses
const STREAM_ENTITY = gql`
  subscription StreamEntity(
    $chatHistory: [ChatMessageInput!]!
    $contextId: String!
    $aiName: String
  ) {
    entityStream(
      chatHistory: $chatHistory
      contextId: $contextId
      aiName: $aiName
    ) {
      content
      done
    }
  }
`;

// Example streaming usage
client.subscribe({
  query: STREAM_ENTITY,
  variables: {
    chatHistory: [
      { role: 'user', content: 'Explain quantum computing' }
    ],
    contextId: "session-123",
    aiName: "Jarvis"
  }
}).subscribe({
  next(response) {
    if (response.data.entityStream.content) {
      console.log(response.data.entityStream.content);
    }
    if (response.data.entityStream.done) {
      console.log('Stream completed');
    }
  },
  error(err) {
    console.error('Error:', err);
  }
});

This example demonstrates:

Setting up a GraphQL client
Starting an entity interaction
Handling tool routing responses
Continuing with specific tools when needed
Using streaming for real-time responses

Configuration Options

aiName: Custom name for the entity
aiStyle: Choose between "OpenAI" or "Anthropic" response styles
aiMemorySelfModify: Enable/disable autonomous memory management
voiceResponse: Enable voice responses with acknowledgments
stream: Enable response streaming
dataSources: Array of data sources to use ["mydata", "aja", "aje", "wires", "bing"]
privateData: Flag for handling private data
language: Preferred language for responses

Tool Integration

The Entity System automatically routes requests to appropriate tools based on content analysis:

Code Execution:
- Detects coding tasks
- Routes to async execution queue
- Returns progress updates
Content Generation:
- Image generation
- Expert writing
- Reasoning tasks
- Document processing
Search and Memory:
- Integrated search capabilities
- Memory context retrieval
- Document analysis
Multi-Modal Processing:
- Vision analysis
- Video processing
- Audio handling
- PDF processing

Memory System

Entities maintain a sophisticated memory system that:

Preserves context between interactions
Self-modifies based on interactions
Categorizes information
Provides relevant context for future interactions

Best Practices

Context Management:
- Use consistent contextId for related interactions
- Limit chat history to recent messages for efficiency
Tool Selection:
- Let the entity auto-route to appropriate tools
- Override routing with specific generatorPathway when needed
Memory Usage:
- Enable aiMemorySelfModify for autonomous memory management
- Use memory context for more coherent interactions
Response Handling:
- Use streaming for real-time interactions
- Enable voice responses for voice interfaces
- Handle async operations with appropriate timeouts

Redis Integration

Cortex uses Redis as both a storage system and a communication backplane:

Memory and Context Storage

Entity Memory: Stores and searches entity memory contexts using contextId as the key
Context Persistence: Saves pathway context between executions

Inter-Service Communication

Distributed Deployment: Enables communication between multiple Cortex instances
Helper App Integration: Facilitates communication with auxiliary services:
- File Handler: Progress updates and file operation status
- Autogen: Message queuing and async task management
- Voice Server: Real-time streaming coordination
- Whisper Wrapper: Transcription job management
Pub/Sub Messaging: Supports real-time event distribution across services
Queue Management: Handles asynchronous task distribution and processing

Caching

Request Caching: When enableCache is true, caches model responses to avoid duplicate API calls
GraphQL Caching: When enableGraphqlCache is true, caches GraphQL query results
Cache Encryption: Uses redisEncryptionKey to encrypt sensitive cached data

Configuration

Redis connection can be configured through environment variables:

# Required
REDIS_URL=redis://your-redis-host:6379

# Optional
REDIS_ENCRYPTION_KEY=your-encryption-key  # For encrypted caching
REDIS_PASSWORD=your-redis-password        # If authentication is required
REDIS_TLS=true                           # For TLS/SSL connections
REDIS_CONNECTION_STRING=                  # Full connection string (alternative to URL)

Cache Management

Cortex implements intelligent cache management:

Automatic cache invalidation based on TTL
Model-specific cache keys for optimized hit rates
Cache size management to prevent memory overflow
Support for cache clearing through API endpoints

Best Practices

Memory Storage:
- Use consistent contextId values for related operations
- Implement regular memory cleanup for unused contexts
- Monitor memory usage to prevent Redis memory overflow
Caching:
- Enable caching for frequently repeated queries
- Use encryption for sensitive data
- Monitor cache hit rates for optimization
High Availability:
- Configure Redis persistence for data durability
- Use Redis clustering for scalability
- Implement failover mechanisms for reliability
Communication:
- Use appropriate channels for different types of messages
- Implement retry logic for critical operations
- Monitor queue lengths and processing times
- Set up proper error handling for pub/sub operations

For Tasks:

Click tags to check more tools for each tasks

create custom chat agents spin up llm powered automatons put a rest or graphql front end on models create modular custom coding assistants create powerful ai editing tools

For Jobs:

ai developer software engineer data scientist machine learning engineer natural language processing specialist

Alternative AI tools for cortex

Similar Open Source Tools

cortex

github

: 58

semantic-cache

Semantic Cache is a tool for caching natural text based on semantic similarity. It allows for classifying text into categories, caching AI responses, and reducing API latency by responding to similar queries with cached values. The tool stores cache entries by meaning, handles synonyms, supports multiple languages, understands complex queries, and offers easy integration with Node.js applications. Users can set a custom proximity threshold for filtering results. The tool is ideal for tasks involving querying or retrieving information based on meaning, such as natural language classification or caching AI responses.

github

: 171

ai2-scholarqa-lib

Ai2 Scholar QA is a system for answering scientific queries and literature review by gathering evidence from multiple documents across a corpus and synthesizing an organized report with evidence for each claim. It consists of a retrieval component and a three-step generator pipeline. The retrieval component fetches relevant evidence passages using the Semantic Scholar public API and reranks them. The generator pipeline includes quote extraction, planning and clustering, and summary generation. The system is powered by the ScholarQA class, which includes components like PaperFinder and MultiStepQAPipeline. It requires environment variables for Semantic Scholar API and LLMs, and can be run as local docker containers or embedded into another application as a Python package.

github

: 142

ActionWeaver

ActionWeaver is an AI application framework designed for simplicity, relying on OpenAI and Pydantic. It supports both OpenAI API and Azure OpenAI service. The framework allows for function calling as a core feature, extensibility to integrate any Python code, function orchestration for building complex call hierarchies, and telemetry and observability integration. Users can easily install ActionWeaver using pip and leverage its capabilities to create, invoke, and orchestrate actions with the language model. The framework also provides structured extraction using Pydantic models and allows for exception handling customization. Contributions to the project are welcome, and users are encouraged to cite ActionWeaver if found useful.

github

: 296

magic-cli

Magic CLI is a command line utility that leverages Large Language Models (LLMs) to enhance command line efficiency. It is inspired by projects like Amazon Q and GitHub Copilot for CLI. The tool allows users to suggest commands, search across command history, and generate commands for specific tasks using local or remote LLM providers. Magic CLI also provides configuration options for LLM selection and response generation. The project is still in early development, so users should expect breaking changes and bugs.

github

: 497

agent-mimir

Agent Mimir is a command line and Discord chat client 'agent' manager for LLM's like Chat-GPT that provides the models with access to tooling and a framework with which accomplish multi-step tasks. It is easy to configure your own agent with a custom personality or profession as well as enabling access to all tools that are compatible with LangchainJS. Agent Mimir is based on LangchainJS, every tool or LLM that works on Langchain should also work with Mimir. The tasking system is based on Auto-GPT and BabyAGI where the agent needs to come up with a plan, iterate over its steps and review as it completes the task.

github

: 103

neo4j-graphrag-python

The Neo4j GraphRAG package for Python is an official repository that provides features for creating and managing vector indexes in Neo4j databases. It aims to offer developers a reliable package with long-term commitment, maintenance, and fast feature updates. The package supports various Python versions and includes functionalities for creating vector indexes, populating them, and performing similarity searches. It also provides guidelines for installation, examples, and development processes such as installing dependencies, making changes, and running tests.

github

: 463

probsem

ProbSem is a repository that provides a framework to leverage large language models (LLMs) for assigning context-conditional probability distributions over queried strings. It supports OpenAI engines and HuggingFace CausalLM models, and is flexible for research applications in linguistics, cognitive science, program synthesis, and NLP. Users can define prompts, contexts, and queries to derive probability distributions over possible completions, enabling tasks like cloze completion, multiple-choice QA, semantic parsing, and code completion. The repository offers CLI and API interfaces for evaluation, with options to customize models, normalize scores, and adjust temperature for probability distributions.

github

: 72

web-llm

WebLLM is a modular and customizable javascript package that directly brings language model chats directly onto web browsers with hardware acceleration. Everything runs inside the browser with no server support and is accelerated with WebGPU. WebLLM is fully compatible with OpenAI API. That is, you can use the same OpenAI API on any open source models locally, with functionalities including json-mode, function-calling, streaming, etc. We can bring a lot of fun opportunities to build AI assistants for everyone and enable privacy while enjoying GPU acceleration.

github

: 16.4k

slack-bot

The Slack Bot is a tool designed to enhance the workflow of development teams by integrating with Jenkins, GitHub, GitLab, and Jira. It allows for custom commands, macros, crons, and project-specific commands to be implemented easily. Users can interact with the bot through Slack messages, execute commands, and monitor job progress. The bot supports features like starting and monitoring Jenkins jobs, tracking pull requests, querying Jira information, creating buttons for interactions, generating images with DALL-E, playing quiz games, checking weather, defining custom commands, and more. Configuration is managed via YAML files, allowing users to set up credentials for external services, define custom commands, schedule cron jobs, and configure VCS systems like Bitbucket for automated branch lookup in Jenkins triggers.

github

: 188

allms

allms is a versatile and powerful library designed to streamline the process of querying Large Language Models (LLMs). Developed by Allegro engineers, it simplifies working with LLM applications by providing a user-friendly interface, asynchronous querying, automatic retrying mechanism, error handling, and output parsing. It supports various LLM families hosted on different platforms like OpenAI, Google, Azure, and GCP. The library offers features for configuring endpoint credentials, batch querying with symbolic variables, and forcing structured output format. It also provides documentation, quickstart guides, and instructions for local development, testing, updating documentation, and making new releases.

github

: 82

Bard-API

The Bard API is a Python package that returns responses from Google Bard through the value of a cookie. It is an unofficial API that operates through reverse-engineering, utilizing cookie values to interact with Google Bard for users struggling with frequent authentication problems or unable to authenticate via Google Authentication. The Bard API is not a free service, but rather a tool provided to assist developers with testing certain functionalities due to the delayed development and release of Google Bard's API. It has been designed with a lightweight structure that can easily adapt to the emergence of an official API. Therefore, using it for any other purposes is strongly discouraged. If you have access to a reliable official PaLM-2 API or Google Generative AI API, replace the provided response with the corresponding official code. Check out https://github.com/dsdanielpark/Bard-API/issues/262.

github

: 5.4k

experts

Experts.js is a tool that simplifies the creation and deployment of OpenAI's Assistants, allowing users to link them together as Tools to create a Panel of Experts system with expanded memory and attention to detail. It leverages the new Assistants API from OpenAI, which offers advanced features such as referencing attached files & images as knowledge sources, supporting instructions up to 256,000 characters, integrating with 128 tools, and utilizing the Vector Store API for efficient file search. Experts.js introduces Assistants as Tools, enabling the creation of Multi AI Agent Systems where each Tool is an LLM-backed Assistant that can take on specialized roles or fulfill complex tasks.

github

: 964

MiniAgents

MiniAgents is an open-source Python framework designed to simplify the creation of multi-agent AI systems. It offers a parallelism and async-first design, allowing users to focus on building intelligent agents while handling concurrency challenges. The framework, built on asyncio, supports LLM-based applications with immutable messages and seamless asynchronous token and message streaming between agents.

github

: 93

mflux

MFLUX is a line-by-line port of the FLUX implementation in the Huggingface Diffusers library to Apple MLX. It aims to run powerful FLUX models from Black Forest Labs locally on Mac machines. The codebase is minimal and explicit, prioritizing readability over generality and performance. Models are implemented from scratch in MLX, with tokenizers from the Huggingface Transformers library. Dependencies include Numpy and Pillow for image post-processing. Installation can be done using `uv tool` or classic virtual environment setup. Command-line arguments allow for image generation with specified models, prompts, and optional parameters. Quantization options for speed and memory reduction are available. LoRA adapters can be loaded for fine-tuning image generation. Controlnet support provides more control over image generation with reference images. Current limitations include generating images one by one, lack of support for negative prompts, and some LoRA adapters not working.

github

: 1.3k

artkit

ARTKIT is a Python framework developed by BCG X for automating prompt-based testing and evaluation of Gen AI applications. It allows users to develop automated end-to-end testing and evaluation pipelines for Gen AI systems, supporting multi-turn conversations and various testing scenarios like Q&A accuracy, brand values, equitability, safety, and security. The framework provides a simple API, asynchronous processing, caching, model agnostic support, end-to-end pipelines, multi-turn conversations, robust data flows, and visualizations. ARTKIT is designed for customization by data scientists and engineers to enhance human-in-the-loop testing and evaluation, emphasizing the importance of tailored testing for each Gen AI use case.

github

: 107

For similar tasks

cortex

github

: 58

For similar jobs

promptflow

**Prompt flow** is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.

github

: 9.2k

deepeval

DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.

github

: 11.3k

MegaDetector

MegaDetector is an AI model that identifies animals, people, and vehicles in camera trap images (which also makes it useful for eliminating blank images). This model is trained on several million images from a variety of ecosystems. MegaDetector is just one of many tools that aims to make conservation biologists more efficient with AI. If you want to learn about other ways to use AI to accelerate camera trap workflows, check out our of the field, affectionately titled "Everything I know about machine learning and camera traps".

github

: 186

leapfrogai

LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.

github

: 255

llava-docker

This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.

github

: 59

carrot

The 'carrot' repository on GitHub provides a list of free and user-friendly ChatGPT mirror sites for easy access. The repository includes sponsored sites offering various GPT models and services. Users can find and share sites, report errors, and access stable and recommended sites for ChatGPT usage. The repository also includes a detailed list of ChatGPT sites, their features, and accessibility options, making it a valuable resource for ChatGPT users seeking free and unlimited GPT services.

github

: 17.1k

TrustLLM

TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.

github

: 535

AI-YinMei

AI-YinMei is an AI virtual anchor Vtuber development tool (N card version). It supports fastgpt knowledge base chat dialogue, a complete set of solutions for LLM large language models: [fastgpt] + [one-api] + [Xinference], supports docking bilibili live broadcast barrage reply and entering live broadcast welcome speech, supports Microsoft edge-tts speech synthesis, supports Bert-VITS2 speech synthesis, supports GPT-SoVITS speech synthesis, supports expression control Vtuber Studio, supports painting stable-diffusion-webui output OBS live broadcast room, supports painting picture pornography public-NSFW-y-distinguish, supports search and image search service duckduckgo (requires magic Internet access), supports image search service Baidu image search (no magic Internet access), supports AI reply chat box [html plug-in], supports AI singing Auto-Convert-Music, supports playlist [html plug-in], supports dancing function, supports expression video playback, supports head touching action, supports gift smashing action, supports singing automatic start dancing function, chat and singing automatic cycle swing action, supports multi scene switching, background music switching, day and night automatic switching scene, supports open singing and painting, let AI automatically judge the content.

github

: 529