jsonrepair

A high-performance Golang library for easily repairing invalid JSON documents. Designed to fix common JSON issues and optimize JSON content generated by language models (LLMs).

Stars: 54

Visit

Jsonrepair is a Python library that provides functionalities to repair and validate JSON files. It helps users to fix common issues in JSON data such as missing commas, incorrect data types, and structural errors. With jsonrepair, users can easily clean up and standardize their JSON files, ensuring they are well-formed and error-free.

README:

Golang JSONRepair Library

Easily repair invalid JSON documents with the Golang JSONRepair Library. This library is a direct port of the popular jsonrepair JavaScript library, designed to address common issues found in JSON data. Leveraging the performance benefits of Go, it maintains compatibility and reliability with the original JavaScript library. It is particularly useful for optimizing JSON content generated by language models (LLMs).

Features

The jsonrepair library can automatically fix the following JSON issues:

Add missing quotes around keys: Ensures all keys are properly quoted.
Add missing escape characters: Adds necessary escape characters where needed.
Add missing commas: Inserts missing commas between elements.
Add missing closing brackets: Closes any unclosed brackets.
Repair truncated JSON: Completes truncated JSON data.
Replace single quotes with double quotes: Converts single quotes to double quotes.
Replace special quote characters: Converts characters like “...” to standard double quotes.
Replace special white space characters: Converts special whitespace characters to regular spaces.
Replace Python constants: Converts None, True, False to null, true, false.
Strip trailing commas: Removes any trailing commas.
Strip comments: Eliminates comments such as /* ... */ and // ....
Strip fenced code blocks: Removes markdown fenced code blocks like ```json and ```.
Strip ellipsis: Removes ellipsis in arrays and objects, e.g., [1, 2, 3, ...].
Strip JSONP notation: Removes JSONP callbacks, e.g., callback({ ... }).
Strip escape characters: Removes escape characters from strings, e.g., {\"stringified\": \"content\"}.
Strip MongoDB data types: Converts types like NumberLong(2) and ISODate("2012-12-19T06:01:17.171Z") to standard JSON.
Concatenate strings: Merges strings split across lines, e.g., "long text" + "more text on next line".
Convert newline-delimited JSON: Encloses newline-delimited JSON in an array to make it valid, for example:
```
{ "id": 1, "name": "John" }
{ "id": 2, "name": "Sarah" }
```

Install

Install the library using go get:

go get github.com/kaptinlin/jsonrepair

Usage

Basic Usage

Use the JSONRepair function to repair a JSON string:

package main

import (
    "fmt"
    "log"

    "github.com/kaptinlin/jsonrepair"
)

func main() {
    // The following is invalid JSON: it consists of JSON contents copied from
    // a JavaScript code base, where the keys are missing double quotes,
    // and strings are using single quotes:
    json := "{name: 'John'}"

    repaired, err := jsonrepair.JSONRepair(json)
    if err != nil {
        log.Fatalf("Failed to repair JSON: %v", err)
    }

    fmt.Println(repaired) // '{"name": "John"}'
}

API

JSONRepair Function

// JSONRepair attempts to repair the given JSON string and returns the repaired version.
// It returns an error if an issue is encountered which could not be solved.
func JSONRepair(text string) (string, error)

How to Contribute

Contributions to the jsonrepair package are welcome. If you'd like to contribute, please follow the contribution guidelines.

License

Released under the MIT license. See the LICENSE file for details.

Acknowledgements

This library is a Go port of the JavaScript library jsonrepair by Jos de Jong. The original logic and behavior have been closely followed to ensure compatibility and reliability. Special thanks to the original author for creating such a useful tool.

For Tasks:

Click tags to check more tools for each tasks

fix data errors validate json clean json files standardize json structure repair json syntax

For Jobs:

data analyst software developer quality assurance tester data engineer web developer

Alternative AI tools for jsonrepair

Similar Open Source Tools

jsonrepair

github

: 54

SQLBot

SQLBot is a versatile tool for executing SQL queries and managing databases. It provides a user-friendly interface for interacting with databases, allowing users to easily query, insert, update, and delete data. SQLBot supports various database systems such as MySQL, PostgreSQL, and SQLite, making it a valuable tool for developers, data analysts, and database administrators. With SQLBot, users can streamline their database management tasks and improve their productivity by quickly accessing and manipulating data without the need for complex SQL commands.

github

: 3.4k

falkordb-browser

FalkorDB Browser is a user-friendly web application for browsing and managing databases. It provides an intuitive interface for users to interact with their databases, allowing them to view, edit, and query data easily. With FalkorDB Browser, users can perform various database operations without the need for complex commands or scripts, making database management more accessible and efficient.

github

: 55

mcp-server-mysql

The MCP Server for MySQL based on NodeJS is a Model Context Protocol server that provides access to MySQL databases. It enables users to inspect database schemas and execute SQL queries. The server offers tools for executing SQL queries, providing comprehensive database information, security features like SQL injection prevention, performance optimizations, monitoring, and debugging capabilities. Users can configure the server using environment variables and advanced options. The server supports multi-DB mode, schema-specific permissions, and includes troubleshooting guidelines for common issues. Contributions are welcome, and the project roadmap includes enhancing query capabilities, security features, performance optimizations, monitoring, and expanding schema information.

github

: 768

yek

Yek is a fast Rust-based tool designed to read text-based files in a repository or directory, chunk them, and serialize them for Large Language Models (LLM) consumption. It utilizes .gitignore rules to skip unwanted files, Git history to infer important files, and additional ignore patterns. Yek splits content into chunks based on token count or byte size, supports processing multiple directories, and can stream content when output is piped. It is configurable via a 'yek.toml' file and prioritizes important files at the end of the output.

github

: 2.3k

ollama4j

Ollama4j is a Java library that serves as a wrapper or binding for the Ollama server. It allows users to communicate with the Ollama server and manage models for various deployment scenarios. The library provides APIs for interacting with Ollama, generating fake data, testing UI interactions, translating messages, and building web UIs. Users can easily integrate Ollama4j into their Java projects to leverage the functionalities offered by the Ollama server.

github

: 438

Website-Crawler

Website-Crawler is a tool designed to extract data from websites in an automated manner. It allows users to scrape information such as text, images, links, and more from web pages. The tool provides functionalities to navigate through websites, handle different types of content, and store extracted data for further analysis. Website-Crawler is useful for tasks like web scraping, data collection, content aggregation, and competitive analysis. It can be customized to extract specific data elements based on user requirements, making it a versatile tool for various web data extraction needs.

github

: 61

cellm

Cellm is an Excel extension that allows users to leverage Large Language Models (LLMs) like ChatGPT within cell formulas. It enables users to extract AI responses to text ranges, making it useful for automating repetitive tasks that involve data processing and analysis. Cellm supports various models from Anthropic, Mistral, OpenAI, and Google, as well as locally hosted models via Llamafiles, Ollama, or vLLM. The tool is designed to simplify the integration of AI capabilities into Excel for tasks such as text classification, data cleaning, content summarization, entity extraction, and more.

github

: 882

mcp-context-forge

MCP Context Forge is a powerful tool for generating context-aware data for machine learning models. It provides functionalities to create diverse datasets with contextual information, enhancing the performance of AI algorithms. The tool supports various data formats and allows users to customize the context generation process easily. With MCP Context Forge, users can efficiently prepare training data for tasks requiring contextual understanding, such as sentiment analysis, recommendation systems, and natural language processing.

github

: 2.6k

pointer

Pointer is a lightweight and efficient tool for analyzing and visualizing data structures in C and C++ programs. It provides a user-friendly interface to track memory allocations, pointer references, and data structures, helping developers to identify memory leaks, pointer errors, and optimize memory usage. With Pointer, users can easily navigate through complex data structures, visualize memory layouts, and debug pointer-related issues in their codebase. The tool offers interactive features such as memory snapshots, pointer tracking, and memory visualization, making it a valuable asset for C and C++ developers working on memory-intensive applications.

github

: 86

databerry

Chaindesk is a no-code platform that allows users to easily set up a semantic search system for personal data without technical knowledge. It supports loading data from various sources such as raw text, web pages, files (Word, Excel, PowerPoint, PDF, Markdown, Plain Text), and upcoming support for web sites, Notion, and Airtable. The platform offers a user-friendly interface for managing datastores, querying data via a secure API endpoint, and auto-generating ChatGPT Plugins for each datastore. Chaindesk utilizes a Vector Database (Qdrant), Openai's text-embedding-ada-002 for embeddings, and has a chunk size of 1024 tokens. The technology stack includes Next.js, Joy UI, LangchainJS, PostgreSQL, Prisma, and Qdrant, inspired by the ChatGPT Retrieval Plugin.

github

: 2.9k

aide

Aide is a code-first API documentation and utility library for Rust, along with other related utility crates for web-servers. It provides tools for creating API documentation and handling JSON request validation. The repository contains multiple crates that offer drop-in replacements for existing libraries, ensuring compatibility with Aide. Contributions are welcome, and the code is dual licensed under MIT and Apache-2.0. If Aide does not meet your requirements, you can explore similar libraries like paperclip, utoipa, and okapi.

github

: 433

mcp-use

MCP-Use is a Python library for analyzing and processing text data using Markov Chains. It provides functionalities for generating text based on input data, calculating transition probabilities, and simulating text sequences. The library is designed to be user-friendly and efficient, making it suitable for natural language processing tasks.

github

: 7.5k

nvim-aider

Nvim-aider is a plugin for Neovim that provides additional functionality and key mappings to enhance the user's editing experience. It offers features such as code navigation, quick access to commonly used commands, and improved text manipulation tools. With Nvim-aider, users can streamline their workflow and increase productivity while working with Neovim.

github

: 86

PaddleOCR

PaddleOCR is an easy-to-use and scalable OCR toolkit based on PaddlePaddle. It provides a series of text detection and recognition models, supporting multiple languages and various scenarios. With PaddleOCR, users can perform accurate and efficient text extraction from images and videos, making it suitable for tasks such as document scanning, text recognition, and information extraction.

github

: 55.4k

vivaria

Vivaria is a web application tool designed for running evaluations and conducting agent elicitation research. Users can interact with Vivaria using a web UI and a command-line interface. It allows users to start task environments based on METR Task Standard definitions, run AI agents, perform agent elicitation research, view API requests and responses, add tags and comments to runs, store results in a PostgreSQL database, sync data to Airtable, test prompts against LLMs, and authenticate using Auth0.

github

: 110

For similar tasks

jsonrepair

github

: 54

json_repair

This simple package can be used to fix an invalid json string. To know all cases in which this package will work, check out the unit test. Inspired by https://github.com/josdejong/jsonrepair Motivation Some LLMs are a bit iffy when it comes to returning well formed JSON data, sometimes they skip a parentheses and sometimes they add some words in it, because that's what an LLM does. Luckily, the mistakes LLMs make are simple enough to be fixed without destroying the content. I searched for a lightweight python package that was able to reliably fix this problem but couldn't find any. So I wrote one How to use from json_repair import repair_json good_json_string = repair_json(bad_json_string) # If the string was super broken this will return an empty string You can use this library to completely replace `json.loads()`: import json_repair decoded_object = json_repair.loads(json_string) or just import json_repair decoded_object = json_repair.repair_json(json_string, return_objects=True) Read json from a file or file descriptor JSON repair provides also a drop-in replacement for `json.load()`: import json_repair try: file_descriptor = open(fname, 'rb') except OSError: ... with file_descriptor: decoded_object = json_repair.load(file_descriptor) and another method to read from a file: import json_repair try: decoded_object = json_repair.from_file(json_file) except OSError: ... except IOError: ... Keep in mind that the library will not catch any IO-related exception and those will need to be managed by you Performance considerations If you find this library too slow because is using `json.loads()` you can skip that by passing `skip_json_loads=True` to `repair_json`. Like: from json_repair import repair_json good_json_string = repair_json(bad_json_string, skip_json_loads=True) I made a choice of not using any fast json library to avoid having any external dependency, so that anybody can use it regardless of their stack. Some rules of thumb to use: - Setting `return_objects=True` will always be faster because the parser returns an object already and it doesn't have serialize that object to JSON - `skip_json_loads` is faster only if you 100% know that the string is not a valid JSON - If you are having issues with escaping pass the string as **raw** string like: `r"string with escaping\"" Adding to requirements Please pin this library only on the major version! We use TDD and strict semantic versioning, there will be frequent updates and no breaking changes in minor and patch versions. To ensure that you only pin the major version of this library in your `requirements.txt`, specify the package name followed by the major version and a wildcard for minor and patch versions. For example: json_repair==0.* In this example, any version that starts with `0.` will be acceptable, allowing for updates on minor and patch versions. How it works This module will parse the JSON file following the BNF definition: ::= | ::= | | ; Where: ; is a valid real number expressed in one of a number of given formats ; is a string of valid characters enclosed in quotes ; is one of the literal strings 'true', 'false', or 'null' (unquoted) ::=