
aiocsv
Python: Asynchronous CSV reading/writing
Stars: 71

aiocsv is a Python module that provides asynchronous CSV reading and writing. It is designed to be a drop-in replacement for the Python's builtin csv module, but with the added benefit of being able to read and write CSV files asynchronously. This makes it ideal for use in applications that need to process large CSV files efficiently.
README:
Asynchronous CSV reading and writing.
pip install aiocsv
. Python 3.8+ is required.
This module contains an extension written in C. Pre-build binaries may not be available for your configuration. You might need a C compiler and Python headers to install aiocsv.
AsyncReader & AsyncDictReader accept any object that has a read(size: int)
coroutine,
which should return a string.
AsyncWriter & AsyncDictWriter accept any object that has a write(b: str)
coroutine.
Reading is implemented using a custom CSV parser, which should behave exactly like the CPython parser.
Writing is implemented using the synchronous csv.writer and csv.DictWriter objects - the serializers write data to a StringIO, and that buffer is then rewritten to the underlying asynchronous file.
Example usage with aiofiles.
import asyncio
import csv
import aiofiles
from aiocsv import AsyncReader, AsyncDictReader, AsyncWriter, AsyncDictWriter
async def main():
# simple reading
async with aiofiles.open("some_file.csv", mode="r", encoding="utf-8", newline="") as afp:
async for row in AsyncReader(afp):
print(row) # row is a list
# dict reading, tab-separated
async with aiofiles.open("some_other_file.tsv", mode="r", encoding="utf-8", newline="") as afp:
async for row in AsyncDictReader(afp, delimiter="\t"):
print(row) # row is a dict
# simple writing, "unix"-dialect
async with aiofiles.open("new_file.csv", mode="w", encoding="utf-8", newline="") as afp:
writer = AsyncWriter(afp, dialect="unix")
await writer.writerow(["name", "age"])
await writer.writerows([
["John", 26], ["Sasha", 42], ["Hana", 37]
])
# dict writing, all quoted, "NULL" for missing fields
async with aiofiles.open("new_file2.csv", mode="w", encoding="utf-8", newline="") as afp:
writer = AsyncDictWriter(afp, ["name", "age"], restval="NULL", quoting=csv.QUOTE_ALL)
await writer.writeheader()
await writer.writerow({"name": "John", "age": 26})
await writer.writerows([
{"name": "Sasha", "age": 42},
{"name": "Hana"}
])
asyncio.run(main())
aiocsv
strives to be a drop-in replacement for Python's builtin
csv module. However, there are 3 notable differences:
- Readers accept objects with async
read
methods, instead of an AsyncIterable over lines from a file. -
AsyncDictReader.fieldnames
can beNone
- useawait AsyncDictReader.get_fieldnames()
instead. - Changes to
csv.field_size_limit
are not picked up by existing Reader instances. The field size limit is cached on Reader instantiation to avoid expensive function calls on each character of the input.
Other, minor, differences include:
-
AsyncReader.line_num
,AsyncDictReader.line_num
andAsyncDictReader.dialect
are not settable, -
AsyncDictReader.reader
is ofAsyncReader
type, -
AsyncDictWriter.writer
is ofAsyncWriter
type, -
AsyncDictWriter
provides an extra, read-onlydialect
property.
AsyncReader(
asyncfile: aiocsv.protocols.WithAsyncRead,
dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
**csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)
An object that iterates over records in the given asynchronous CSV file. Additional keyword arguments are understood as dialect parameters.
Iterating over this object returns parsed CSV rows (List[str]
).
Methods:
__aiter__(self) -> self
async __anext__(self) -> List[str]
Read-only properties:
-
dialect
(aiocsv.protocols.DialectLike
): The dialect used when parsing -
line_num
(int
): The number of lines read from the source file. This coincides with a 1-based index of the line number of the last line of the recently parsed record.
AsyncDictReader(
asyncfile: aiocsv.protocols.WithAsyncRead,
fieldnames: Optional[Sequence[str]] = None,
restkey: Optional[str] = None,
restval: Optional[str] = None,
dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
**csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)
An object that iterates over records in the given asynchronous CSV file. All arguments work exactly the same was as in csv.DictReader.
Iterating over this object returns parsed CSV rows (Dict[str, str]
).
Methods:
__aiter__(self) -> self
async __anext__(self) -> Dict[str, str]
async get_fieldnames(self) -> List[str]
Properties:
-
fieldnames
(List[str] | None
): field names used when converting rows to dictionaries
⚠️ Unlike csv.DictReader, this property can't read the fieldnames if they are missing - it's not possible toawait
on the header row in a property getter. Useawait reader.get_fieldnames()
.reader = csv.DictReader(some_file) reader.fieldnames # ["cells", "from", "the", "header"] areader = aiofiles.AsyncDictReader(same_file_but_async) areader.fieldnames # ⚠️ None await areader.get_fieldnames() # ["cells", "from", "the", "header"]
-
restkey
(str | None
): If a row has more cells then the header, all remaining cells are stored under this key in the returned dictionary. Defaults toNone
. -
restval
(str | None
): If a row has less cells then the header, then missing keys will use this value. Defaults toNone
. -
reader
: Underlyingaiofiles.AsyncReader
instance
Read-only properties:
-
dialect
(aiocsv.protocols.DialectLike
): Link toself.reader.dialect
- the current csv.Dialect -
line_num
(int
): The number of lines read from the source file. This coincides with a 1-based index of the line number of the last line of the recently parsed record.
AsyncWriter(
asyncfile: aiocsv.protocols.WithAsyncWrite,
dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
**csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)
An object that writes csv rows to the given asynchronous file. In this object "row" is a sequence of values.
Additional keyword arguments are passed to the underlying csv.writer instance.
Methods:
-
async writerow(self, row: Iterable[Any]) -> None
: Writes one row to the specified file. -
async writerows(self, rows: Iterable[Iterable[Any]]) -> None
: Writes multiple rows to the specified file.
Readonly properties:
-
dialect
(aiocsv.protocols.DialectLike
): Link to underlying's csv.writer'sdialect
attribute
AsyncDictWriter(
asyncfile: aiocsv.protocols.WithAsyncWrite,
fieldnames: Sequence[str],
restval: Any = "",
extrasaction: Literal["raise", "ignore"] = "raise",
dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
**csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)
An object that writes csv rows to the given asynchronous file. In this object "row" is a mapping from fieldnames to values.
Additional keyword arguments are passed to the underlying csv.DictWriter instance.
Methods:
-
async writeheader(self) -> None
: Writes header row to the specified file. -
async writerow(self, row: Mapping[str, Any]) -> None
: Writes one row to the specified file. -
async writerows(self, rows: Iterable[Mapping[str, Any]]) -> None
: Writes multiple rows to the specified file.
Properties:
-
fieldnames
(Sequence[str]
): Sequence of keys to identify the order of values when writing rows to the underlying file -
restval
(Any
): Placeholder value used when a key from fieldnames is missing in a row, defaults to""
-
extrasaction
(Literal["raise", "ignore"]
): Action to take when there are keys in a row, which are not present in fieldnames, defaults to"raise"
which causes ValueError to be raised on extra keys, may be also set to"ignore"
to ignore any extra keys -
writer
: Link to the underlyingAsyncWriter
Readonly properties:
-
dialect
(aiocsv.protocols.DialectLike
): Link to underlying's csv.reader'sdialect
attribute
A typing.Protocol
describing an asynchronous file, which can be read.
A typing.Protocol
describing an asynchronous file, which can be written to.
Type of an instantiated dialect
property. Thank CPython for an incredible mess of
having unrelated and disjoint csv.Dialect
and _csv.Dialect
classes.
Type of the dialect
argument, as used in the csv
module.
Keyword arguments used by csv
module to override the dialect settings during reader/writer
instantiation.
Contributions are welcome, however please open an issue beforehand. aiocsv
is meant as
a replacement for the built-in csv
, any features not present in the latter will be rejected.
To create a wheel (and a source tarball), run python -m build
.
For local development, use a virtual environment.
pip install --editable .
will build the C extension and make it available for the current
venv. This is required for running the tests. However, due to the mess of Python packaging
this will force an optimized build without debugging symbols. If you need to debug the C part
of aiocsv and build the library with e.g. debugging symbols, the only sane way is to
run python setup.py build --debug
and manually copy the shared object/DLL from build/lib*/aiocsv
to aiocsv
.
This project uses pytest with
pytest-asyncio for testing. Run pytest
after installing the library in the manner explained above.
This library uses black and isort for formatting and pyright in strict mode for type checking.
For the C part of library, please use clang-format for formatting and clang-tidy linting, however this are not yet integrated in the CI.
pip install -r requirements.dev.txt
will pull all of the development tools mentioned above,
however this might not be necessary depending on your setup. For example, if you use VS Code
with the Python extension, pyright is already bundled and doesn't need to be installed again.
Use Python, Pylance (should be installed automatically alongside Python extension), black and isort Python extensions.
You will need to install all dev dependencies from requirements.dev.txt
, except for pyright
.
Recommended .vscode/settings.json
:
{
"C_Cpp.codeAnalysis.clangTidy.enabled": true,
"python.testing.pytestArgs": [
"."
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"[python]": {
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.organizeImports": "always"
}
},
"[c]": {
"editor.formatOnSave": true
}
}
For the C part of the library, C/C++ extension is sufficient.
Ensure that your system has Python headers installed. Usually a separate package like python3-dev
needs to be installed, consult with your system repositories on that. .vscode/c_cpp_properties.json
needs to manually include Python headers under includePath
. On my particular system this
config file looks like this:
{
"configurations": [
{
"name": "Linux",
"includePath": [
"${workspaceFolder}/**",
"/usr/include/python3.11"
],
"defines": [],
"compilerPath": "/usr/bin/clang",
"cStandard": "c17",
"cppStandard": "c++17",
"intelliSenseMode": "linux-clang-x64"
}
],
"version": 4
}
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for aiocsv
Similar Open Source Tools

aiocsv
aiocsv is a Python module that provides asynchronous CSV reading and writing. It is designed to be a drop-in replacement for the Python's builtin csv module, but with the added benefit of being able to read and write CSV files asynchronously. This makes it ideal for use in applications that need to process large CSV files efficiently.

langchain-decorators
LangChain Decorators is a layer on top of LangChain that provides syntactic sugar for writing custom langchain prompts and chains. It offers a more pythonic way of writing code, multiline prompts without breaking code flow, IDE support for hinting and type checking, leveraging LangChain ecosystem, support for optional parameters, and sharing parameters between prompts. It simplifies streaming, automatic LLM selection, defining custom settings, debugging, and passing memory, callback, stop, etc. It also provides functions provider, dynamic function schemas, binding prompts to objects, defining custom settings, and debugging options. The project aims to enhance the LangChain library by making it easier to use and more efficient for writing custom prompts and chains.

Lumos
Lumos is a Chrome extension powered by a local LLM co-pilot for browsing the web. It allows users to summarize long threads, news articles, and technical documentation. Users can ask questions about reviews and product pages. The tool requires a local Ollama server for LLM inference and embedding database. Lumos supports multimodal models and file attachments for processing text and image content. It also provides options to customize models, hosts, and content parsers. The extension can be easily accessed through keyboard shortcuts and offers tools for automatic invocation based on prompts.

simplemind
Simplemind is an AI library designed to simplify the experience with AI APIs in Python. It provides easy-to-use AI tools with a human-centered design and minimal configuration. Users can tap into powerful AI capabilities through simple interfaces, without needing to be experts. The library supports various APIs from different providers/models and offers features like text completion, streaming text, structured data handling, conversational AI, tool calling, and logging. Simplemind aims to make AI models accessible to all by abstracting away complexity and prioritizing readability and usability.

cria
Cria is a Python library designed for running Large Language Models with minimal configuration. It provides an easy and concise way to interact with LLMs, offering advanced features such as custom models, streams, message history management, and running multiple models in parallel. Cria simplifies the process of using LLMs by providing a straightforward API that requires only a few lines of code to get started. It also handles model installation automatically, making it efficient and user-friendly for various natural language processing tasks.

magentic
Easily integrate Large Language Models into your Python code. Simply use the `@prompt` and `@chatprompt` decorators to create functions that return structured output from the LLM. Mix LLM queries and function calling with regular Python code to create complex logic.

llm-rag-workshop
The LLM RAG Workshop repository provides a workshop on using Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) to generate and understand text in a human-like manner. It includes instructions on setting up the environment, indexing Zoomcamp FAQ documents, creating a Q&A system, and using OpenAI for generation based on retrieved information. The repository focuses on enhancing language model responses with retrieved information from external sources, such as document databases or search engines, to improve factual accuracy and relevance of generated text.

partial-json-parser-js
Partial JSON Parser is a lightweight and customizable library for parsing partial JSON strings. It allows users to parse incomplete JSON data and stream it to the user. The library provides options to specify what types of partialness are allowed during parsing, such as strings, objects, arrays, special values, and more. It helps handle malformed JSON and returns the parsed JavaScript value. Partial JSON Parser is implemented purely in JavaScript and offers both commonjs and esm builds.

motorhead
Motorhead is a memory and information retrieval server for LLMs. It provides three simple APIs to assist with memory handling in chat applications using LLMs. The first API, GET /sessions/:id/memory, returns messages up to a maximum window size. The second API, POST /sessions/:id/memory, allows you to send an array of messages to Motorhead for storage. The third API, DELETE /sessions/:id/memory, deletes the session's message list. Motorhead also features incremental summarization, where it processes half of the maximum window size of messages and summarizes them when the maximum is reached. Additionally, it supports searching by text query using vector search. Motorhead is configurable through environment variables, including the maximum window size, whether to enable long-term memory, the model used for incremental summarization, the server port, your OpenAI API key, and the Redis URL.

parsera
Parsera is a lightweight Python library designed for scraping websites using LLMs. It offers simplicity and efficiency by minimizing token usage, enhancing speed, and reducing costs. Users can easily set up and run the tool to extract specific elements from web pages, generating JSON output with relevant data. Additionally, Parsera supports integration with various chat models, such as Azure, expanding its functionality and customization options for web scraping tasks.

simpleAI
SimpleAI is a self-hosted alternative to the not-so-open AI API, focused on replicating main endpoints for LLM such as text completion, chat, edits, and embeddings. It allows quick experimentation with different models, creating benchmarks, and handling specific use cases without relying on external services. Users can integrate and declare models through gRPC, query endpoints using Swagger UI or API, and resolve common issues like CORS with FastAPI middleware. The project is open for contributions and welcomes PRs, issues, documentation, and more.

godot-llm
Godot LLM is a plugin that enables the utilization of large language models (LLM) for generating content in games. It provides functionality for text generation, text embedding, multimodal text generation, and vector database management within the Godot game engine. The plugin supports features like Retrieval Augmented Generation (RAG) and integrates llama.cpp-based functionalities for text generation, embedding, and multimodal capabilities. It offers support for various platforms and allows users to experiment with LLM models in their game development projects.

scaleapi-python-client
The Scale AI Python SDK is a tool that provides a Python interface for interacting with the Scale API. It allows users to easily create tasks, manage projects, upload files, and work with evaluation tasks, training tasks, and Studio assignments. The SDK handles error handling and provides detailed documentation for each method. Users can also manage teammates, project groups, and batches within the Scale Studio environment. The SDK supports various functionalities such as creating tasks, retrieving tasks, canceling tasks, auditing tasks, updating task attributes, managing files, managing team members, and working with evaluation and training tasks.

gen.nvim
gen.nvim is a tool that allows users to generate text using Language Models (LLMs) with customizable prompts. It requires Ollama with models like `llama3`, `mistral`, or `zephyr`, along with Curl for installation. Users can use the `Gen` command to generate text based on predefined or custom prompts. The tool provides key maps for easy invocation and allows for follow-up questions during conversations. Additionally, users can select a model from a list of installed models and customize prompts as needed.

deep-searcher
DeepSearcher is a tool that combines reasoning LLMs and Vector Databases to perform search, evaluation, and reasoning based on private data. It is suitable for enterprise knowledge management, intelligent Q&A systems, and information retrieval scenarios. The tool maximizes the utilization of enterprise internal data while ensuring data security, supports multiple embedding models, and provides support for multiple LLMs for intelligent Q&A and content generation. It also includes features like private data search, vector database management, and document loading with web crawling capabilities under development.
For similar tasks

aiocsv
aiocsv is a Python module that provides asynchronous CSV reading and writing. It is designed to be a drop-in replacement for the Python's builtin csv module, but with the added benefit of being able to read and write CSV files asynchronously. This makes it ideal for use in applications that need to process large CSV files efficiently.
For similar jobs

lollms-webui
LoLLMs WebUI (Lord of Large Language Multimodal Systems: One tool to rule them all) is a user-friendly interface to access and utilize various LLM (Large Language Models) and other AI models for a wide range of tasks. With over 500 AI expert conditionings across diverse domains and more than 2500 fine tuned models over multiple domains, LoLLMs WebUI provides an immediate resource for any problem, from car repair to coding assistance, legal matters, medical diagnosis, entertainment, and more. The easy-to-use UI with light and dark mode options, integration with GitHub repository, support for different personalities, and features like thumb up/down rating, copy, edit, and remove messages, local database storage, search, export, and delete multiple discussions, make LoLLMs WebUI a powerful and versatile tool.

Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

minio
MinIO is a High Performance Object Storage released under GNU Affero General Public License v3.0. It is API compatible with Amazon S3 cloud storage service. Use MinIO to build high performance infrastructure for machine learning, analytics and application data workloads.

mage-ai
Mage is an open-source data pipeline tool for transforming and integrating data. It offers an easy developer experience, engineering best practices built-in, and data as a first-class citizen. Mage makes it easy to build, preview, and launch data pipelines, and provides observability and scaling capabilities. It supports data integrations, streaming pipelines, and dbt integration.

AiTreasureBox
AiTreasureBox is a versatile AI tool that provides a collection of pre-trained models and algorithms for various machine learning tasks. It simplifies the process of implementing AI solutions by offering ready-to-use components that can be easily integrated into projects. With AiTreasureBox, users can quickly prototype and deploy AI applications without the need for extensive knowledge in machine learning or deep learning. The tool covers a wide range of tasks such as image classification, text generation, sentiment analysis, object detection, and more. It is designed to be user-friendly and accessible to both beginners and experienced developers, making AI development more efficient and accessible to a wider audience.

tidb
TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

airbyte
Airbyte is an open-source data integration platform that makes it easy to move data from any source to any destination. With Airbyte, you can build and manage data pipelines without writing any code. Airbyte provides a library of pre-built connectors that make it easy to connect to popular data sources and destinations. You can also create your own connectors using Airbyte's no-code Connector Builder or low-code CDK. Airbyte is used by data engineers and analysts at companies of all sizes to build and manage their data pipelines.

labelbox-python
Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.