aiocsv
Python: Asynchronous CSV reading/writing
Stars: 59
aiocsv is a Python module that provides asynchronous CSV reading and writing. It is designed to be a drop-in replacement for the Python's builtin csv module, but with the added benefit of being able to read and write CSV files asynchronously. This makes it ideal for use in applications that need to process large CSV files efficiently.
README:
Asynchronous CSV reading and writing.
pip install aiocsv
. Python 3.8+ is required.
This module contains an extension written in C. Pre-build binaries may not be available for your configuration. You might need a C compiler and Python headers to install aiocsv.
AsyncReader & AsyncDictReader accept any object that has a read(size: int)
coroutine,
which should return a string.
AsyncWriter & AsyncDictWriter accept any object that has a write(b: str)
coroutine.
Reading is implemented using a custom CSV parser, which should behave exactly like the CPython parser.
Writing is implemented using the synchronous csv.writer and csv.DictWriter objects - the serializers write data to a StringIO, and that buffer is then rewritten to the underlying asynchronous file.
Example usage with aiofiles.
import asyncio
import csv
import aiofiles
from aiocsv import AsyncReader, AsyncDictReader, AsyncWriter, AsyncDictWriter
async def main():
# simple reading
async with aiofiles.open("some_file.csv", mode="r", encoding="utf-8", newline="") as afp:
async for row in AsyncReader(afp):
print(row) # row is a list
# dict reading, tab-separated
async with aiofiles.open("some_other_file.tsv", mode="r", encoding="utf-8", newline="") as afp:
async for row in AsyncDictReader(afp, delimiter="\t"):
print(row) # row is a dict
# simple writing, "unix"-dialect
async with aiofiles.open("new_file.csv", mode="w", encoding="utf-8", newline="") as afp:
writer = AsyncWriter(afp, dialect="unix")
await writer.writerow(["name", "age"])
await writer.writerows([
["John", 26], ["Sasha", 42], ["Hana", 37]
])
# dict writing, all quoted, "NULL" for missing fields
async with aiofiles.open("new_file2.csv", mode="w", encoding="utf-8", newline="") as afp:
writer = AsyncDictWriter(afp, ["name", "age"], restval="NULL", quoting=csv.QUOTE_ALL)
await writer.writeheader()
await writer.writerow({"name": "John", "age": 26})
await writer.writerows([
{"name": "Sasha", "age": 42},
{"name": "Hana"}
])
asyncio.run(main())
aiocsv
strives to be a drop-in replacement for Python's builtin
csv module. However, there are 3 notable differences:
- Readers accept objects with async
read
methods, instead of an AsyncIterable over lines from a file. -
AsyncDictReader.fieldnames
can beNone
- useawait AsyncDictReader.get_fieldnames()
instead. - Changes to
csv.field_size_limit
are not picked up by existing Reader instances. The field size limit is cached on Reader instantiation to avoid expensive function calls on each character of the input.
Other, minor, differences include:
-
AsyncReader.line_num
,AsyncDictReader.line_num
andAsyncDictReader.dialect
are not settable, -
AsyncDictReader.reader
is ofAsyncReader
type, -
AsyncDictWriter.writer
is ofAsyncWriter
type, -
AsyncDictWriter
provides an extra, read-onlydialect
property.
AsyncReader(
asyncfile: aiocsv.protocols.WithAsyncRead,
dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
**csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)
An object that iterates over records in the given asynchronous CSV file. Additional keyword arguments are understood as dialect parameters.
Iterating over this object returns parsed CSV rows (List[str]
).
Methods:
__aiter__(self) -> self
async __anext__(self) -> List[str]
Read-only properties:
-
dialect
: The csv.Dialect used when parsing -
line_num
: The number of lines read from the source file. This coincides with a 1-based index of the line number of the last line of the recently parsed record.
AsyncDictReader(
asyncfile: aiocsv.protocols.WithAsyncRead,
fieldnames: Optional[Sequence[str]] = None,
restkey: Optional[str] = None,
restval: Optional[str] = None,
dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
**csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)
An object that iterates over records in the given asynchronous CSV file. All arguments work exactly the same was as in csv.DictReader.
Iterating over this object returns parsed CSV rows (Dict[str, str]
).
Methods:
__aiter__(self) -> self
async __anext__(self) -> Dict[str, str]
async get_fieldnames(self) -> List[str]
Properties:
-
fieldnames
: field names used when converting rows to dictionaries
⚠️ Unlike csv.DictReader, this property can't read the fieldnames if they are missing - it's not possible toawait
on the header row in a property getter. Useawait reader.get_fieldnames()
.reader = csv.DictReader(some_file) reader.fieldnames # ["cells", "from", "the", "header"] areader = aiofiles.AsyncDictReader(same_file_but_async) areader.fieldnames # ⚠️ None await areader.get_fieldnames() # ["cells", "from", "the", "header"]
-
restkey
: If a row has more cells then the header, all remaining cells are stored under this key in the returned dictionary. Defaults toNone
. -
restval
: If a row has less cells then the header, then missing keys will use this value. Defaults toNone
. -
reader
: Underlyingaiofiles.AsyncReader
instance
Read-only properties:
-
dialect
: Link toself.reader.dialect
- the current csv.Dialect -
line_num
: The number of lines read from the source file. This coincides with a 1-based index of the line number of the last line of the recently parsed record.
AsyncWriter(
asyncfile: aiocsv.protocols.WithAsyncWrite,
dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
**csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)
An object that writes csv rows to the given asynchronous file. In this object "row" is a sequence of values.
Additional keyword arguments are passed to the underlying csv.writer instance.
Methods:
-
async writerow(self, row: Iterable[Any]) -> None
: Writes one row to the specified file. -
async writerows(self, rows: Iterable[Iterable[Any]]) -> None
: Writes multiple rows to the specified file.
Readonly properties:
-
dialect
: Link to underlying's csv.writer'sdialect
attribute
AsyncDictWriter(
asyncfile: aiocsv.protocols.WithAsyncWrite,
fieldnames: Sequence[str],
restval: Any = "",
extrasaction: Literal["raise", "ignore"] = "raise",
dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
**csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)
An object that writes csv rows to the given asynchronous file. In this object "row" is a mapping from fieldnames to values.
Additional keyword arguments are passed to the underlying csv.DictWriter instance.
Methods:
-
async writeheader(self) -> None
: Writes header row to the specified file. -
async writerow(self, row: Mapping[str, Any]) -> None
: Writes one row to the specified file. -
async writerows(self, rows: Iterable[Mapping[str, Any]]) -> None
: Writes multiple rows to the specified file.
Properties:
-
fieldnames
: Sequence of keys to identify the order of values when writing rows to the underlying file -
restval
: Placeholder value used when a key from fieldnames is missing in a row, defaults to""
-
extrasaction
: Action to take when there are keys in a row, which are not present in fieldnames, defaults to"raise"
which causes ValueError to be raised on extra keys, may be also set to"ignore"
to ignore any extra keys -
writer
: Link to the underlyingAsyncWriter
Readonly properties:
-
dialect
: Link to underlying's csv.reader'sdialect
attribute
A typing.Protocol
describing an asynchronous file, which can be read.
A typing.Protocol
describing an asynchronous file, which can be written to.
Type of the dialect
argument, as used in the csv
module.
Keyword arguments used by csv
module to override the dialect settings during reader/writer
instantiation.
Contributions are welcome, however please open an issue beforehand. aiocsv
is meant as
a replacement for the built-in csv
, any features not present in the latter will be rejected.
To create a wheel (and a source tarball), run python -m build
.
For local development, use a virtual environment.
pip install --editable .
will build the C extension and make it available for the current
venv. This is required for running the tests. However, due to the mess of Python packaging
this will force an optimized build without debugging symbols. If you need to debug the C part
of aiocsv and build the library with e.g. debugging symbols, the only sane way is to
run python setup.py build --debug
and manually copy the shared object/DLL from build/lib*/aiocsv
to aiocsv
.
This project uses pytest with
pytest-asyncio for testing. Run pytest
after installing the library in the manner explained above.
This library uses black and isort for formatting and pyright in strict mode for type checking.
For the C part of library, please use clang-format for formatting and clang-tidy linting, however this are not yet integrated in the CI.
pip install -r requirements.dev.txt
will pull all of the development tools mentioned above,
however this might not be necessary depending on your setup. For example, if you use VS Code
with the Python extension, pyright is already bundled and doesn't need to be installed again.
Use Python, Pylance (should be installed automatically alongside Python extension), black and isort Python extensions.
You will need to install all dev dependencies from requirements.dev.txt
, except for pyright
.
Recommended .vscode/settings.json
:
{
"C_Cpp.codeAnalysis.clangTidy.enabled": true,
"python.testing.pytestArgs": [
"."
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"[python]": {
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.organizeImports": "always"
}
},
"[c]": {
"editor.formatOnSave": true
}
}
For the C part of the library, C/C++ extension is sufficient.
Ensure that your system has Python headers installed. Usually a separate package like python3-dev
needs to be installed, consult with your system repositories on that. .vscode/c_cpp_properties.json
needs to manually include Python headers under includePath
. On my particular system this
config file looks like this:
{
"configurations": [
{
"name": "Linux",
"includePath": [
"${workspaceFolder}/**",
"/usr/include/python3.11"
],
"defines": [],
"compilerPath": "/usr/bin/clang",
"cStandard": "c17",
"cppStandard": "c++17",
"intelliSenseMode": "linux-clang-x64"
}
],
"version": 4
}
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for aiocsv
Similar Open Source Tools
aiocsv
aiocsv is a Python module that provides asynchronous CSV reading and writing. It is designed to be a drop-in replacement for the Python's builtin csv module, but with the added benefit of being able to read and write CSV files asynchronously. This makes it ideal for use in applications that need to process large CSV files efficiently.
Lumos
Lumos is a Chrome extension powered by a local LLM co-pilot for browsing the web. It allows users to summarize long threads, news articles, and technical documentation. Users can ask questions about reviews and product pages. The tool requires a local Ollama server for LLM inference and embedding database. Lumos supports multimodal models and file attachments for processing text and image content. It also provides options to customize models, hosts, and content parsers. The extension can be easily accessed through keyboard shortcuts and offers tools for automatic invocation based on prompts.
motorhead
Motorhead is a memory and information retrieval server for LLMs. It provides three simple APIs to assist with memory handling in chat applications using LLMs. The first API, GET /sessions/:id/memory, returns messages up to a maximum window size. The second API, POST /sessions/:id/memory, allows you to send an array of messages to Motorhead for storage. The third API, DELETE /sessions/:id/memory, deletes the session's message list. Motorhead also features incremental summarization, where it processes half of the maximum window size of messages and summarizes them when the maximum is reached. Additionally, it supports searching by text query using vector search. Motorhead is configurable through environment variables, including the maximum window size, whether to enable long-term memory, the model used for incremental summarization, the server port, your OpenAI API key, and the Redis URL.
llm-rag-workshop
The LLM RAG Workshop repository provides a workshop on using Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) to generate and understand text in a human-like manner. It includes instructions on setting up the environment, indexing Zoomcamp FAQ documents, creating a Q&A system, and using OpenAI for generation based on retrieved information. The repository focuses on enhancing language model responses with retrieved information from external sources, such as document databases or search engines, to improve factual accuracy and relevance of generated text.
invariant
Invariant Analyzer is an open-source scanner designed for LLM-based AI agents to find bugs, vulnerabilities, and security threats. It scans agent execution traces to identify issues like looping behavior, data leaks, prompt injections, and unsafe code execution. The tool offers a library of built-in checkers, an expressive policy language, data flow analysis, real-time monitoring, and extensible architecture for custom checkers. It helps developers debug AI agents, scan for security violations, and prevent security issues and data breaches during runtime. The analyzer leverages deep contextual understanding and a purpose-built rule matching engine for security policy enforcement.
bolna
Bolna is an open-source platform for building voice-driven conversational applications using large language models (LLMs). It provides a comprehensive set of tools and integrations to handle various aspects of voice-based interactions, including telephony, transcription, LLM-based conversation handling, and text-to-speech synthesis. Bolna simplifies the process of creating voice agents that can perform tasks such as initiating phone calls, transcribing conversations, generating LLM-powered responses, and synthesizing speech. It supports multiple providers for each component, allowing users to customize their setup based on their specific needs. Bolna is designed to be easy to use, with a straightforward local setup process and well-documented APIs. It is also extensible, enabling users to integrate with other telephony providers or add custom functionality.
unify
The Unify Python Package provides access to the Unify REST API, allowing users to query Large Language Models (LLMs) from any Python 3.7.1+ application. It includes Synchronous and Asynchronous clients with Streaming responses support. Users can easily use any endpoint with a single key, route to the best endpoint for optimal throughput, cost, or latency, and customize prompts to interact with the models. The package also supports dynamic routing to automatically direct requests to the top-performing provider. Additionally, users can enable streaming responses and interact with the models asynchronously for handling multiple user requests simultaneously.
ragtacts
Ragtacts is a Clojure library that allows users to easily interact with Large Language Models (LLMs) such as OpenAI's GPT-4. Users can ask questions to LLMs, create question templates, call Clojure functions in natural language, and utilize vector databases for more accurate answers. Ragtacts also supports RAG (Retrieval-Augmented Generation) method for enhancing LLM output by incorporating external data. Users can use Ragtacts as a CLI tool, API server, or through a RAG Playground for interactive querying.
magentic
Easily integrate Large Language Models into your Python code. Simply use the `@prompt` and `@chatprompt` decorators to create functions that return structured output from the LLM. Mix LLM queries and function calling with regular Python code to create complex logic.
cria
Cria is a Python library designed for running Large Language Models with minimal configuration. It provides an easy and concise way to interact with LLMs, offering advanced features such as custom models, streams, message history management, and running multiple models in parallel. Cria simplifies the process of using LLMs by providing a straightforward API that requires only a few lines of code to get started. It also handles model installation automatically, making it efficient and user-friendly for various natural language processing tasks.
elia
Elia is a powerful terminal user interface designed for interacting with large language models. It allows users to chat with models like Claude 3, ChatGPT, Llama 3, Phi 3, Mistral, and Gemma. Conversations are stored locally in a SQLite database, ensuring privacy. Users can run local models through 'ollama' without data leaving their machine. Elia offers easy installation with pipx and supports various environment variables for different models. It provides a quick start to launch chats and manage local models. Configuration options are available to customize default models, system prompts, and add new models. Users can import conversations from ChatGPT and wipe the database when needed. Elia aims to enhance user experience in interacting with language models through a user-friendly interface.
simpleAI
SimpleAI is a self-hosted alternative to the not-so-open AI API, focused on replicating main endpoints for LLM such as text completion, chat, edits, and embeddings. It allows quick experimentation with different models, creating benchmarks, and handling specific use cases without relying on external services. Users can integrate and declare models through gRPC, query endpoints using Swagger UI or API, and resolve common issues like CORS with FastAPI middleware. The project is open for contributions and welcomes PRs, issues, documentation, and more.
parsera
Parsera is a lightweight Python library designed for scraping websites using LLMs. It offers simplicity and efficiency by minimizing token usage, enhancing speed, and reducing costs. Users can easily set up and run the tool to extract specific elements from web pages, generating JSON output with relevant data. Additionally, Parsera supports integration with various chat models, such as Azure, expanding its functionality and customization options for web scraping tasks.
client-python
The Mistral Python Client is a tool inspired by cohere-python that allows users to interact with the Mistral AI API. It provides functionalities to access and utilize the AI capabilities offered by Mistral. Users can easily install the client using pip and manage dependencies using poetry. The client includes examples demonstrating how to use the API for various tasks, such as chat interactions. To get started, users need to obtain a Mistral API Key and set it as an environment variable. Overall, the Mistral Python Client simplifies the integration of Mistral AI services into Python applications.
gen.nvim
gen.nvim is a tool that allows users to generate text using Language Models (LLMs) with customizable prompts. It requires Ollama with models like `llama3`, `mistral`, or `zephyr`, along with Curl for installation. Users can use the `Gen` command to generate text based on predefined or custom prompts. The tool provides key maps for easy invocation and allows for follow-up questions during conversations. Additionally, users can select a model from a list of installed models and customize prompts as needed.
redis-vl-python
The Python Redis Vector Library (RedisVL) is a tailor-made client for AI applications leveraging Redis. It enhances applications with Redis' speed, flexibility, and reliability, incorporating capabilities like vector-based semantic search, full-text search, and geo-spatial search. The library bridges the gap between the emerging AI-native developer ecosystem and the capabilities of Redis by providing a lightweight, elegant, and intuitive interface. It abstracts the features of Redis into a grammar that is more aligned to the needs of today's AI/ML Engineers or Data Scientists.
For similar tasks
aiocsv
aiocsv is a Python module that provides asynchronous CSV reading and writing. It is designed to be a drop-in replacement for the Python's builtin csv module, but with the added benefit of being able to read and write CSV files asynchronously. This makes it ideal for use in applications that need to process large CSV files efficiently.
For similar jobs
lollms-webui
LoLLMs WebUI (Lord of Large Language Multimodal Systems: One tool to rule them all) is a user-friendly interface to access and utilize various LLM (Large Language Models) and other AI models for a wide range of tasks. With over 500 AI expert conditionings across diverse domains and more than 2500 fine tuned models over multiple domains, LoLLMs WebUI provides an immediate resource for any problem, from car repair to coding assistance, legal matters, medical diagnosis, entertainment, and more. The easy-to-use UI with light and dark mode options, integration with GitHub repository, support for different personalities, and features like thumb up/down rating, copy, edit, and remove messages, local database storage, search, export, and delete multiple discussions, make LoLLMs WebUI a powerful and versatile tool.
Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.
minio
MinIO is a High Performance Object Storage released under GNU Affero General Public License v3.0. It is API compatible with Amazon S3 cloud storage service. Use MinIO to build high performance infrastructure for machine learning, analytics and application data workloads.
mage-ai
Mage is an open-source data pipeline tool for transforming and integrating data. It offers an easy developer experience, engineering best practices built-in, and data as a first-class citizen. Mage makes it easy to build, preview, and launch data pipelines, and provides observability and scaling capabilities. It supports data integrations, streaming pipelines, and dbt integration.
AiTreasureBox
AiTreasureBox is a versatile AI tool that provides a collection of pre-trained models and algorithms for various machine learning tasks. It simplifies the process of implementing AI solutions by offering ready-to-use components that can be easily integrated into projects. With AiTreasureBox, users can quickly prototype and deploy AI applications without the need for extensive knowledge in machine learning or deep learning. The tool covers a wide range of tasks such as image classification, text generation, sentiment analysis, object detection, and more. It is designed to be user-friendly and accessible to both beginners and experienced developers, making AI development more efficient and accessible to a wider audience.
tidb
TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.
airbyte
Airbyte is an open-source data integration platform that makes it easy to move data from any source to any destination. With Airbyte, you can build and manage data pipelines without writing any code. Airbyte provides a library of pre-built connectors that make it easy to connect to popular data sources and destinations. You can also create your own connectors using Airbyte's no-code Connector Builder or low-code CDK. Airbyte is used by data engineers and analysts at companies of all sizes to build and manage their data pipelines.
labelbox-python
Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.