aiocsv

Python: Asynchronous CSV reading/writing

Stars: 59

Visit

aiocsv is a Python module that provides asynchronous CSV reading and writing. It is designed to be a drop-in replacement for the Python's builtin csv module, but with the added benefit of being able to read and write CSV files asynchronously. This makes it ideal for use in applications that need to process large CSV files efficiently.

README:

aiocsv

Asynchronous CSV reading and writing.

Installation

pip install aiocsv. Python 3.8+ is required.

This module contains an extension written in C. Pre-build binaries may not be available for your configuration. You might need a C compiler and Python headers to install aiocsv.

Usage

AsyncReader & AsyncDictReader accept any object that has a read(size: int) coroutine, which should return a string.

AsyncWriter & AsyncDictWriter accept any object that has a write(b: str) coroutine.

Reading is implemented using a custom CSV parser, which should behave exactly like the CPython parser.

Writing is implemented using the synchronous csv.writer and csv.DictWriter objects - the serializers write data to a StringIO, and that buffer is then rewritten to the underlying asynchronous file.

Example

Example usage with aiofiles.

import asyncio
import csv

import aiofiles
from aiocsv import AsyncReader, AsyncDictReader, AsyncWriter, AsyncDictWriter

async def main():
    # simple reading
    async with aiofiles.open("some_file.csv", mode="r", encoding="utf-8", newline="") as afp:
        async for row in AsyncReader(afp):
            print(row)  # row is a list

    # dict reading, tab-separated
    async with aiofiles.open("some_other_file.tsv", mode="r", encoding="utf-8", newline="") as afp:
        async for row in AsyncDictReader(afp, delimiter="\t"):
            print(row)  # row is a dict

    # simple writing, "unix"-dialect
    async with aiofiles.open("new_file.csv", mode="w", encoding="utf-8", newline="") as afp:
        writer = AsyncWriter(afp, dialect="unix")
        await writer.writerow(["name", "age"])
        await writer.writerows([
            ["John", 26], ["Sasha", 42], ["Hana", 37]
        ])

    # dict writing, all quoted, "NULL" for missing fields
    async with aiofiles.open("new_file2.csv", mode="w", encoding="utf-8", newline="") as afp:
        writer = AsyncDictWriter(afp, ["name", "age"], restval="NULL", quoting=csv.QUOTE_ALL)
        await writer.writeheader()
        await writer.writerow({"name": "John", "age": 26})
        await writer.writerows([
            {"name": "Sasha", "age": 42},
            {"name": "Hana"}
        ])

asyncio.run(main())

Differences with `csv`

aiocsv strives to be a drop-in replacement for Python's builtin csv module. However, there are 3 notable differences:

Readers accept objects with async read methods, instead of an AsyncIterable over lines from a file.
AsyncDictReader.fieldnames can be None - use await AsyncDictReader.get_fieldnames() instead.
Changes to csv.field_size_limit are not picked up by existing Reader instances. The field size limit is cached on Reader instantiation to avoid expensive function calls on each character of the input.

Other, minor, differences include:

AsyncReader.line_num, AsyncDictReader.line_num and AsyncDictReader.dialect are not settable,
AsyncDictReader.reader is of AsyncReader type,
AsyncDictWriter.writer is of AsyncWriter type,
AsyncDictWriter provides an extra, read-only dialect property.

Reference

aiocsv.AsyncReader

AsyncReader(
    asyncfile: aiocsv.protocols.WithAsyncRead,
    dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
    **csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)

An object that iterates over records in the given asynchronous CSV file. Additional keyword arguments are understood as dialect parameters.

Iterating over this object returns parsed CSV rows (List[str]).

Methods:

__aiter__(self) -> self
async __anext__(self) -> List[str]

Read-only properties:

dialect: The csv.Dialect used when parsing
line_num: The number of lines read from the source file. This coincides with a 1-based index of the line number of the last line of the recently parsed record.

aiocsv.AsyncDictReader

AsyncDictReader(
    asyncfile: aiocsv.protocols.WithAsyncRead,
    fieldnames: Optional[Sequence[str]] = None,
    restkey: Optional[str] = None,
    restval: Optional[str] = None,
    dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
    **csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)

An object that iterates over records in the given asynchronous CSV file. All arguments work exactly the same was as in csv.DictReader.

Iterating over this object returns parsed CSV rows (Dict[str, str]).

Methods:

__aiter__(self) -> self
async __anext__(self) -> Dict[str, str]
async get_fieldnames(self) -> List[str]

Properties:

fieldnames: field names used when converting rows to dictionaries
⚠️ Unlike csv.DictReader, this property can't read the fieldnames if they are missing - it's not possible to await on the header row in a property getter. Use await reader.get_fieldnames().

reader = csv.DictReader(some_file)
reader.fieldnames  # ["cells", "from", "the", "header"]

areader = aiofiles.AsyncDictReader(same_file_but_async)
areader.fieldnames   # ⚠️ None
await areader.get_fieldnames()  # ["cells", "from", "the", "header"]

restkey: If a row has more cells then the header, all remaining cells are stored under this key in the returned dictionary. Defaults to None.
restval: If a row has less cells then the header, then missing keys will use this value. Defaults to None.
reader: Underlying aiofiles.AsyncReader instance

Read-only properties:

dialect: Link to self.reader.dialect - the current csv.Dialect
line_num: The number of lines read from the source file. This coincides with a 1-based index of the line number of the last line of the recently parsed record.

aiocsv.AsyncWriter

AsyncWriter(
    asyncfile: aiocsv.protocols.WithAsyncWrite,
    dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
    **csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)

An object that writes csv rows to the given asynchronous file. In this object "row" is a sequence of values.

Additional keyword arguments are passed to the underlying csv.writer instance.

Methods:

async writerow(self, row: Iterable[Any]) -> None: Writes one row to the specified file.
async writerows(self, rows: Iterable[Iterable[Any]]) -> None: Writes multiple rows to the specified file.

Readonly properties:

dialect: Link to underlying's csv.writer's dialect attribute

aiocsv.AsyncDictWriter

AsyncDictWriter(
    asyncfile: aiocsv.protocols.WithAsyncWrite,
    fieldnames: Sequence[str],
    restval: Any = "",
    extrasaction: Literal["raise", "ignore"] = "raise",
    dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
    **csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)

An object that writes csv rows to the given asynchronous file. In this object "row" is a mapping from fieldnames to values.

Additional keyword arguments are passed to the underlying csv.DictWriter instance.

Methods:

async writeheader(self) -> None: Writes header row to the specified file.
async writerow(self, row: Mapping[str, Any]) -> None: Writes one row to the specified file.
async writerows(self, rows: Iterable[Mapping[str, Any]]) -> None: Writes multiple rows to the specified file.

Properties:

fieldnames: Sequence of keys to identify the order of values when writing rows to the underlying file
restval: Placeholder value used when a key from fieldnames is missing in a row, defaults to ""
extrasaction: Action to take when there are keys in a row, which are not present in fieldnames, defaults to "raise" which causes ValueError to be raised on extra keys, may be also set to "ignore" to ignore any extra keys
writer: Link to the underlying AsyncWriter

Readonly properties:

dialect: Link to underlying's csv.reader's dialect attribute

aiocsv.protocols.WithAsyncRead

A typing.Protocol describing an asynchronous file, which can be read.

aiocsv.protocols.WithAsyncWrite

A typing.Protocol describing an asynchronous file, which can be written to.

aiocsv.protocols.CsvDialectArg

Type of the dialect argument, as used in the csv module.

aiocsv.protocols.CsvDialectKwargs

Keyword arguments used by csv module to override the dialect settings during reader/writer instantiation.

Development

Contributions are welcome, however please open an issue beforehand. aiocsv is meant as a replacement for the built-in csv, any features not present in the latter will be rejected.

Building from source

To create a wheel (and a source tarball), run python -m build.

For local development, use a virtual environment. pip install --editable . will build the C extension and make it available for the current venv. This is required for running the tests. However, due to the mess of Python packaging this will force an optimized build without debugging symbols. If you need to debug the C part of aiocsv and build the library with e.g. debugging symbols, the only sane way is to run python setup.py build --debug and manually copy the shared object/DLL from build/lib*/aiocsv to aiocsv.

Tests

This project uses pytest with pytest-asyncio for testing. Run pytest after installing the library in the manner explained above.

Linting & other tools

This library uses black and isort for formatting and pyright in strict mode for type checking.

For the C part of library, please use clang-format for formatting and clang-tidy linting, however this are not yet integrated in the CI.

Installing required tools

pip install -r requirements.dev.txt will pull all of the development tools mentioned above, however this might not be necessary depending on your setup. For example, if you use VS Code with the Python extension, pyright is already bundled and doesn't need to be installed again.

Recommended VS Code settings

Use Python, Pylance (should be installed automatically alongside Python extension), black and isort Python extensions.

You will need to install all dev dependencies from requirements.dev.txt, except for pyright. Recommended .vscode/settings.json:

{
    "C_Cpp.codeAnalysis.clangTidy.enabled": true,
    "python.testing.pytestArgs": [
        "."
    ],
    "python.testing.unittestEnabled": false,
    "python.testing.pytestEnabled": true,
    "[python]": {
        "editor.formatOnSave": true,
        "editor.codeActionsOnSave": {
            "source.organizeImports": "always"
        }
    },
    "[c]": {
        "editor.formatOnSave": true
    }
}

For the C part of the library, C/C++ extension is sufficient. Ensure that your system has Python headers installed. Usually a separate package like python3-dev needs to be installed, consult with your system repositories on that. .vscode/c_cpp_properties.json needs to manually include Python headers under includePath. On my particular system this config file looks like this:

{
    "configurations": [
        {
            "name": "Linux",
            "includePath": [
                "${workspaceFolder}/**",
                "/usr/include/python3.11"
            ],
            "defines": [],
            "compilerPath": "/usr/bin/clang",
            "cStandard": "c17",
            "cppStandard": "c++17",
            "intelliSenseMode": "linux-clang-x64"
        }
    ],
    "version": 4
}

For Tasks:

Click tags to check more tools for each tasks

read csv files write csv files parse csv files convert csv to json convert json to csv

For Jobs:

data analyst data scientist data engineer software developer devops engineer

Alternative AI tools for aiocsv

Similar Open Source Tools

aiocsv

github

: 59

Lumos

Lumos is a Chrome extension powered by a local LLM co-pilot for browsing the web. It allows users to summarize long threads, news articles, and technical documentation. Users can ask questions about reviews and product pages. The tool requires a local Ollama server for LLM inference and embedding database. Lumos supports multimodal models and file attachments for processing text and image content. It also provides options to customize models, hosts, and content parsers. The extension can be easily accessed through keyboard shortcuts and offers tools for automatic invocation based on prompts.

github

: 1.3k

motorhead

Motorhead is a memory and information retrieval server for LLMs. It provides three simple APIs to assist with memory handling in chat applications using LLMs. The first API, GET /sessions/:id/memory, returns messages up to a maximum window size. The second API, POST /sessions/:id/memory, allows you to send an array of messages to Motorhead for storage. The third API, DELETE /sessions/:id/memory, deletes the session's message list. Motorhead also features incremental summarization, where it processes half of the maximum window size of messages and summarizes them when the maximum is reached. Additionally, it supports searching by text query using vector search. Motorhead is configurable through environment variables, including the maximum window size, whether to enable long-term memory, the model used for incremental summarization, the server port, your OpenAI API key, and the Redis URL.

github

: 840

llm-rag-workshop

The LLM RAG Workshop repository provides a workshop on using Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) to generate and understand text in a human-like manner. It includes instructions on setting up the environment, indexing Zoomcamp FAQ documents, creating a Q&A system, and using OpenAI for generation based on retrieved information. The repository focuses on enhancing language model responses with retrieved information from external sources, such as document databases or search engines, to improve factual accuracy and relevance of generated text.

github

: 166

invariant

Invariant Analyzer is an open-source scanner designed for LLM-based AI agents to find bugs, vulnerabilities, and security threats. It scans agent execution traces to identify issues like looping behavior, data leaks, prompt injections, and unsafe code execution. The tool offers a library of built-in checkers, an expressive policy language, data flow analysis, real-time monitoring, and extensible architecture for custom checkers. It helps developers debug AI agents, scan for security violations, and prevent security issues and data breaches during runtime. The analyzer leverages deep contextual understanding and a purpose-built rule matching engine for security policy enforcement.

github

: 107

bolna

Bolna is an open-source platform for building voice-driven conversational applications using large language models (LLMs). It provides a comprehensive set of tools and integrations to handle various aspects of voice-based interactions, including telephony, transcription, LLM-based conversation handling, and text-to-speech synthesis. Bolna simplifies the process of creating voice agents that can perform tasks such as initiating phone calls, transcribing conversations, generating LLM-powered responses, and synthesizing speech. It supports multiple providers for each component, allowing users to customize their setup based on their specific needs. Bolna is designed to be easy to use, with a straightforward local setup process and well-documented APIs. It is also extensible, enabling users to integrate with other telephony providers or add custom functionality.

github

: 369

unify

The Unify Python Package provides access to the Unify REST API, allowing users to query Large Language Models (LLMs) from any Python 3.7.1+ application. It includes Synchronous and Asynchronous clients with Streaming responses support. Users can easily use any endpoint with a single key, route to the best endpoint for optimal throughput, cost, or latency, and customize prompts to interact with the models. The package also supports dynamic routing to automatically direct requests to the top-performing provider. Additionally, users can enable streaming responses and interact with the models asynchronously for handling multiple user requests simultaneously.

github

: 181

ragtacts

Ragtacts is a Clojure library that allows users to easily interact with Large Language Models (LLMs) such as OpenAI's GPT-4. Users can ask questions to LLMs, create question templates, call Clojure functions in natural language, and utilize vector databases for more accurate answers. Ragtacts also supports RAG (Retrieval-Augmented Generation) method for enhancing LLM output by incorporating external data. Users can use Ragtacts as a CLI tool, API server, or through a RAG Playground for interactive querying.

github

: 59

magentic

Easily integrate Large Language Models into your Python code. Simply use the `@prompt` and `@chatprompt` decorators to create functions that return structured output from the LLM. Mix LLM queries and function calling with regular Python code to create complex logic.

github

: 2.0k

cria

Cria is a Python library designed for running Large Language Models with minimal configuration. It provides an easy and concise way to interact with LLMs, offering advanced features such as custom models, streams, message history management, and running multiple models in parallel. Cria simplifies the process of using LLMs by providing a straightforward API that requires only a few lines of code to get started. It also handles model installation automatically, making it efficient and user-friendly for various natural language processing tasks.

github

: 105

elia

Elia is a powerful terminal user interface designed for interacting with large language models. It allows users to chat with models like Claude 3, ChatGPT, Llama 3, Phi 3, Mistral, and Gemma. Conversations are stored locally in a SQLite database, ensuring privacy. Users can run local models through 'ollama' without data leaving their machine. Elia offers easy installation with pipx and supports various environment variables for different models. It provides a quick start to launch chats and manage local models. Configuration options are available to customize default models, system prompts, and add new models. Users can import conversations from ChatGPT and wipe the database when needed. Elia aims to enhance user experience in interacting with language models through a user-friendly interface.

github

: 1.8k

simpleAI

SimpleAI is a self-hosted alternative to the not-so-open AI API, focused on replicating main endpoints for LLM such as text completion, chat, edits, and embeddings. It allows quick experimentation with different models, creating benchmarks, and handling specific use cases without relying on external services. Users can integrate and declare models through gRPC, query endpoints using Swagger UI or API, and resolve common issues like CORS with FastAPI middleware. The project is open for contributions and welcomes PRs, issues, documentation, and more.

github

: 325

parsera

Parsera is a lightweight Python library designed for scraping websites using LLMs. It offers simplicity and efficiency by minimizing token usage, enhancing speed, and reducing costs. Users can easily set up and run the tool to extract specific elements from web pages, generating JSON output with relevant data. Additionally, Parsera supports integration with various chat models, such as Azure, expanding its functionality and customization options for web scraping tasks.

github

: 749

client-python

The Mistral Python Client is a tool inspired by cohere-python that allows users to interact with the Mistral AI API. It provides functionalities to access and utilize the AI capabilities offered by Mistral. Users can easily install the client using pip and manage dependencies using poetry. The client includes examples demonstrating how to use the API for various tasks, such as chat interactions. To get started, users need to obtain a Mistral API Key and set it as an environment variable. Overall, the Mistral Python Client simplifies the integration of Mistral AI services into Python applications.

github

: 451

gen.nvim

gen.nvim is a tool that allows users to generate text using Language Models (LLMs) with customizable prompts. It requires Ollama with models like `llama3`, `mistral`, or `zephyr`, along with Curl for installation. Users can use the `Gen` command to generate text based on predefined or custom prompts. The tool provides key maps for easy invocation and allows for follow-up questions during conversations. Additionally, users can select a model from a list of installed models and customize prompts as needed.

github

: 1.1k

redis-vl-python

The Python Redis Vector Library (RedisVL) is a tailor-made client for AI applications leveraging Redis. It enhances applications with Redis' speed, flexibility, and reliability, incorporating capabilities like vector-based semantic search, full-text search, and geo-spatial search. The library bridges the gap between the emerging AI-native developer ecosystem and the capabilities of Redis by providing a lightweight, elegant, and intuitive interface. It abstracts the features of Redis into a grammar that is more aligned to the needs of today's AI/ML Engineers or Data Scientists.

github

: 215

For similar tasks

aiocsv

github

: 59

For similar jobs

lollms-webui

LoLLMs WebUI (Lord of Large Language Multimodal Systems: One tool to rule them all) is a user-friendly interface to access and utilize various LLM (Large Language Models) and other AI models for a wide range of tasks. With over 500 AI expert conditionings across diverse domains and more than 2500 fine tuned models over multiple domains, LoLLMs WebUI provides an immediate resource for any problem, from car repair to coding assistance, legal matters, medical diagnosis, entertainment, and more. The easy-to-use UI with light and dark mode options, integration with GitHub repository, support for different personalities, and features like thumb up/down rating, copy, edit, and remove messages, local database storage, search, export, and delete multiple discussions, make LoLLMs WebUI a powerful and versatile tool.

github

: 4.3k

Azure-Analytics-and-AI-Engagement

The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

github

: 136

minio

MinIO is a High Performance Object Storage released under GNU Affero General Public License v3.0. It is API compatible with Amazon S3 cloud storage service. Use MinIO to build high performance infrastructure for machine learning, analytics and application data workloads.

github

: 46.0k

mage-ai

Mage is an open-source data pipeline tool for transforming and integrating data. It offers an easy developer experience, engineering best practices built-in, and data as a first-class citizen. Mage makes it easy to build, preview, and launch data pipelines, and provides observability and scaling capabilities. It supports data integrations, streaming pipelines, and dbt integration.

github

: 7.8k

AiTreasureBox

AiTreasureBox is a versatile AI tool that provides a collection of pre-trained models and algorithms for various machine learning tasks. It simplifies the process of implementing AI solutions by offering ready-to-use components that can be easily integrated into projects. With AiTreasureBox, users can quickly prototype and deploy AI applications without the need for extensive knowledge in machine learning or deep learning. The tool covers a wide range of tasks such as image classification, text generation, sentiment analysis, object detection, and more. It is designed to be user-friendly and accessible to both beginners and experienced developers, making AI development more efficient and accessible to a wider audience.

github

: 368

tidb

TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

github

: 37.1k

airbyte

Airbyte is an open-source data integration platform that makes it easy to move data from any source to any destination. With Airbyte, you can build and manage data pipelines without writing any code. Airbyte provides a library of pre-built connectors that make it easy to connect to popular data sources and destinations. You can also create your own connectors using Airbyte's no-code Connector Builder or low-code CDK. Airbyte is used by data engineers and analysts at companies of all sizes to build and manage their data pipelines.

github

: 15.9k

labelbox-python

Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.

github

: 115

aiocsv

README:

aiocsv

Installation

Usage

Example

Differences with csv

Reference

aiocsv.AsyncReader

aiocsv.AsyncDictReader

aiocsv.AsyncWriter

aiocsv.AsyncDictWriter

aiocsv.protocols.WithAsyncRead

aiocsv.protocols.WithAsyncWrite

aiocsv.protocols.CsvDialectArg

aiocsv.protocols.CsvDialectKwargs

Development

Building from source

Tests

Linting & other tools

Installing required tools

Recommended VS Code settings

For Tasks:

For Jobs:

Alternative AI tools for aiocsv

Similar Open Source Tools

aiocsv

Lumos

motorhead

llm-rag-workshop

invariant

bolna

unify

ragtacts

magentic

cria

elia

simpleAI

parsera

client-python

gen.nvim

redis-vl-python

For similar tasks

aiocsv

For similar jobs

lollms-webui

Azure-Analytics-and-AI-Engagement

minio

mage-ai

AiTreasureBox

tidb

airbyte

labelbox-python

Differences with `csv`