aicsimageio

Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python

Stars: 198

Visit

AICSImageIO is a Python tool for Image Reading, Metadata Conversion, and Image Writing for Microscopy Images. It supports various file formats like OME-TIFF, TIFF, ND2, DV, CZI, LIF, PNG, GIF, and Bio-Formats. Users can read and write metadata and imaging data, work with different file systems like local paths, HTTP URLs, s3fs, and gcsfs. The tool provides functionalities for full image reading, delayed image reading, mosaic image reading, metadata reading, xarray coordinate plane attachment, cloud IO support, and saving to OME-TIFF. It also offers benchmarking and developer resources.

README:

AICSImageIO

[!WARNING]
AICSImageIO is now in maintenance mode only. Please take a look at its compatible successor bioio (see here for migration guide)

Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Pure Python

Features

Supports reading metadata and imaging data for:
- OME-TIFF
- TIFF
- ND2 -- (pip install aicsimageio[nd2])
- DV -- (pip install aicsimageio[dv])
- CZI -- (pip install aicspylibczi>=3.1.1 fsspec>=2022.8.0)
- LIF -- (pip install readlif>=0.6.4)
- PNG, GIF, etc. -- (pip install aicsimageio[base-imageio])
- Files supported by Bio-Formats -- (pip install aicsimageio bioformats_jar) (Note: requires java and maven, see below for details.)
Supports writing metadata and imaging data for:
- OME-TIFF
- PNG, GIF, etc. -- (pip install aicsimageio[base-imageio])
Supports reading and writing to fsspec supported file systems wherever possible:
- Local paths (i.e. my-file.png)
- HTTP URLs (i.e. https://my-domain.com/my-file.png)
- s3fs (i.e. s3://my-bucket/my-file.png)
- gcsfs (i.e. gcs://my-bucket/my-file.png)
See Cloud IO Support for more details.

Installation

Stable Release: pip install aicsimageio
Development Head: pip install git+https://github.com/AllenCellModeling/aicsimageio.git

AICSImageIO is supported on Windows, Mac, and Ubuntu. For other platforms, you will likely need to build from source.

Extra Format Installation

TIFF and OME-TIFF reading and writing is always available after installing aicsimageio, but extra supported formats can be optionally installed using [...] syntax.

For a single additional supported format (e.g. ND2): pip install aicsimageio[nd2]
For a single additional supported format (e.g. ND2), development head: pip install "aicsimageio[nd2] @ git+https://github.com/AllenCellModeling/aicsimageio.git"
For a single additional supported format (e.g. ND2), specific tag (e.g. v4.0.0.dev6): pip install "aicsimageio[nd2] @ git+https://github.com/AllenCellModeling/[email protected]"
For faster OME-TIFF reading with tile tags: pip install aicsimageio[bfio]
For multiple additional supported formats: pip install aicsimageio[base-imageio,nd2]
For all additional supported (and openly licensed) formats: pip install aicsimageio[all]
Due to the GPL license, LIF support is not included with the [all] extra, and must be installed manually with pip install aicsimageio readlif>=0.6.4
Due to the GPL license, CZI support is not included with the [all] extra, and must be installed manually with pip install aicsimageio aicspylibczi>=3.1.1 fsspec>=2022.8.0
Due to the GPL license, Bio-Formats support is not included with the [all] extra, and must be installed manually with pip install aicsimageio bioformats_jar. Important!! Bio-Formats support also requires a java and mvn executable in the environment. The simplest method is to install bioformats_jar from conda: conda install -c conda-forge bioformats_jar (which will additionally bring openjdk and maven packages).

Documentation

For full package documentation please visit allencellmodeling.github.io/aicsimageio.

Quickstart

Full Image Reading

If your image fits in memory:

from aicsimageio import AICSImage

# Get an AICSImage object
img = AICSImage("my_file.tiff")  # selects the first scene found
img.data  # returns 5D TCZYX numpy array
img.xarray_data  # returns 5D TCZYX xarray data array backed by numpy
img.dims  # returns a Dimensions object
img.dims.order  # returns string "TCZYX"
img.dims.X  # returns size of X dimension
img.shape  # returns tuple of dimension sizes in TCZYX order
img.get_image_data("CZYX", T=0)  # returns 4D CZYX numpy array

# Get the id of the current operating scene
img.current_scene

# Get a list valid scene ids
img.scenes

# Change scene using name
img.set_scene("Image:1")
# Or by scene index
img.set_scene(1)

# Use the same operations on a different scene
# ...

Full Image Reading Notes

The .data and .xarray_data properties will load the whole scene into memory. The .get_image_data function will load the whole scene into memory and then retrieve the specified chunk.

Delayed Image Reading

If your image doesn't fit in memory:

from aicsimageio import AICSImage

# Get an AICSImage object
img = AICSImage("my_file.tiff")  # selects the first scene found
img.dask_data  # returns 5D TCZYX dask array
img.xarray_dask_data  # returns 5D TCZYX xarray data array backed by dask array
img.dims  # returns a Dimensions object
img.dims.order  # returns string "TCZYX"
img.dims.X  # returns size of X dimension
img.shape  # returns tuple of dimension sizes in TCZYX order

# Pull only a specific chunk in-memory
lazy_t0 = img.get_image_dask_data("CZYX", T=0)  # returns out-of-memory 4D dask array
t0 = lazy_t0.compute()  # returns in-memory 4D numpy array

# Get the id of the current operating scene
img.current_scene

# Get a list valid scene ids
img.scenes

# Change scene using name
img.set_scene("Image:1")
# Or by scene index
img.set_scene(1)

# Use the same operations on a different scene
# ...

Delayed Image Reading Notes

The .dask_data and .xarray_dask_data properties and the .get_image_dask_data function will not load any piece of the imaging data into memory until you specifically call .compute on the returned Dask array. In doing so, you will only then load the selected chunk in-memory.

Mosaic Image Reading

Read stitched data or single tiles as a dimension.

Readers that support mosaic tile stitching:

LifReader
CziReader

AICSImage

If the file format reader supports stitching mosaic tiles together, the AICSImage object will default to stitching the tiles back together.

img = AICSImage("very-large-mosaic.lif")
img.dims.order  # T, C, Z, big Y, big X, (S optional)
img.dask_data  # Dask chunks fall on tile boundaries, pull YX chunks out of the image

This behavior can be manually turned off:

img = AICSImage("very-large-mosaic.lif", reconstruct_mosaic=False)
img.dims.order  # M (tile index), T, C, Z, small Y, small X, (S optional)
img.dask_data  # Chunks use normal ZYX

If the reader does not support stitching tiles together the M tile index will be available on the AICSImage object:

img = AICSImage("some-unsupported-mosaic-stitching-format.ext")
img.dims.order  # M (tile index), T, C, Z, small Y, small X, (S optional)
img.dask_data  # Chunks use normal ZYX

Reader

If the file format reader detects mosaic tiles in the image, the Reader object will store the tiles as a dimension.

If tile stitching is implemented, the Reader can also return the stitched image.

reader = LifReader("ver-large-mosaic.lif")
reader.dims.order  # M, T, C, Z, tile size Y, tile size X, (S optional)
reader.dask_data  # normal operations, can use M dimension to select individual tiles
reader.mosaic_dask_data  # returns stitched mosaic - T, C, Z, big Y, big, X, (S optional)

Single Tile Absolute Positioning

There are functions available on both the AICSImage and Reader objects to help with single tile positioning:

img = AICSImage("very-large-mosaic.lif")
img.mosaic_tile_dims  # Returns a Dimensions object with just Y and X dim sizes
img.mosaic_tile_dims.Y  # 512 (for example)

# Get the tile start indices (top left corner of tile)
y_start_index, x_start_index = img.get_mosaic_tile_position(12)

Metadata Reading

from aicsimageio import AICSImage

# Get an AICSImage object
img = AICSImage("my_file.tiff")  # selects the first scene found
img.metadata  # returns the metadata object for this file format (XML, JSON, etc.)
img.channel_names  # returns a list of string channel names found in the metadata
img.physical_pixel_sizes.Z  # returns the Z dimension pixel size as found in the metadata
img.physical_pixel_sizes.Y  # returns the Y dimension pixel size as found in the metadata
img.physical_pixel_sizes.X  # returns the X dimension pixel size as found in the metadata

Xarray Coordinate Plane Attachment

If aicsimageio finds coordinate information for the spatial-temporal dimensions of the image in metadata, you can use xarray for indexing by coordinates.

from aicsimageio import AICSImage

# Get an AICSImage object
img = AICSImage("my_file.ome.tiff")

# Get the first ten seconds (not frames)
first_ten_seconds = img.xarray_data.loc[:10]  # returns an xarray.DataArray

# Get the first ten major units (usually micrometers, not indices) in Z
first_ten_mm_in_z = img.xarray_data.loc[:, :, :10]

# Get the first ten major units (usually micrometers, not indices) in Y
first_ten_mm_in_y = img.xarray_data.loc[:, :, :, :10]

# Get the first ten major units (usually micrometers, not indices) in X
first_ten_mm_in_x = img.xarray_data.loc[:, :, :, :, :10]

See xarray "Indexing and Selecting Data" Documentation for more information.

Cloud IO Support

File-System Specification (fsspec) allows for common object storage services (S3, GCS, etc.) to act like normal filesystems by following the same base specification across them all. AICSImageIO utilizes this standard specification to make it possible to read directly from remote resources when the specification is installed.

from aicsimageio import AICSImage

# Get an AICSImage object
img = AICSImage("http://my-website.com/my_file.tiff")
img = AICSImage("s3://my-bucket/my_file.tiff")
img = AICSImage("gcs://my-bucket/my_file.tiff")

# Or read with specific filesystem creation arguments
img = AICSImage("s3://my-bucket/my_file.tiff", fs_kwargs=dict(anon=True))
img = AICSImage("gcs://my-bucket/my_file.tiff", fs_kwargs=dict(anon=True))

# All other normal operations work just fine

Remote reading requires that the file-system specification implementation for the target backend is installed.

For s3: pip install s3fs
For gs: pip install gcsfs

See the list of known implementations.

Saving to OME-TIFF

The simpliest method to save your image as an OME-TIFF file with key pieces of metadata is to use the save function.

from aicsimageio import AICSImage

AICSImage("my_file.czi").save("my_file.ome.tiff")

Note: By default aicsimageio will generate only a portion of metadata to pass along from the reader to the OME model. This function currently does not do a full metadata translation.

For finer grain customization of the metadata, scenes, or if you want to save an array as an OME-TIFF, the writer class can also be used to customize as needed.

import numpy as np
from aicsimageio.writers import OmeTiffWriter

image = np.random.rand(10, 3, 1024, 2048)
OmeTiffWriter.save(image, "file.ome.tif", dim_order="ZCYX")

See OmeTiffWriter documentation for more details.

Other Writers

In most cases, AICSImage.save is usually a good default but there are other image writers available. For more information, please refer to our writers documentation.

Benchmarks

AICSImageIO is benchmarked using asv. You can find the benchmark results for every commit to main starting at the 4.0 release on our benchmarks page.

Development

See our developer resources for information related to developing the code.

Citation

If you find aicsimageio useful, please cite this repository as:

Eva Maxfield Brown, Dan Toloudis, Jamie Sherman, Madison Swain-Bowden, Talley Lambert, AICSImageIO Contributors (2021). AICSImageIO: Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Pure Python [Computer software]. GitHub. https://github.com/AllenCellModeling/aicsimageio

bibtex:

@misc{aicsimageio,
  author    = {Brown, Eva Maxfield and Toloudis, Dan and Sherman, Jamie and Swain-Bowden, Madison and Lambert, Talley and {AICSImageIO Contributors}},
  title     = {AICSImageIO: Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Pure Python},
  year      = {2021},
  publisher = {GitHub},
  url       = {https://github.com/AllenCellModeling/aicsimageio}
}

Free software: BSD-3-Clause

(The LIF component is licensed under GPLv3 and is not included in this package) (The Bio-Formats component is licensed under GPLv2 and is not included in this package) (The CZI component is licensed under GPLv3 and is not included in this package)

For Tasks:

Click tags to check more tools for each tasks

read microscopy images convert metadata write imaging data work with different file systems save images as ome-tiff

For Jobs:

microscopist image analyst biomedical researcher bioinformatics scientist data scientist

Alternative AI tools for aicsimageio

Similar Open Source Tools

aicsimageio

github

: 198

openedai-speech

OpenedAI Speech is a free, private text-to-speech server compatible with the OpenAI audio/speech API. It offers custom voice cloning and supports various models like tts-1 and tts-1-hd. Users can map their own piper voices and create custom cloned voices. The server provides multilingual support with XTTS voices and allows fixing incorrect sounds with regex. Recent changes include bug fixes, improved error handling, and updates for multilingual support. Installation can be done via Docker or manual setup, with usage instructions provided. Custom voices can be created using Piper or Coqui XTTS v2, with guidelines for preparing audio files. The tool is suitable for tasks like generating speech from text, creating custom voices, and multilingual text-to-speech applications.

github

: 243

AIMNet2

AIMNet2 Calculator is a package that integrates the AIMNet2 neural network potential into simulation workflows, providing fast and reliable energy, force, and property calculations for molecules with diverse elements. It excels at modeling various systems, offers flexible interfaces for popular simulation packages, and supports long-range interactions using DSF or Ewald summation Coulomb models. The tool is designed for accurate and versatile molecular simulations, suitable for large molecules and periodic calculations.

github

: 58

hqq

HQQ is a fast and accurate model quantizer that skips the need for calibration data. It's super simple to implement (just a few lines of code for the optimizer). It can crunch through quantizing the Llama2-70B model in only 4 minutes! 🚀

github

: 770

cursive-py

Cursive is a universal and intuitive framework for interacting with LLMs. It is extensible, allowing users to hook into any part of a completion life cycle. Users can easily describe functions that LLMs can use with any supported model. Cursive aims to bridge capabilities between different models, providing a single interface for users to choose any model. It comes with built-in token usage and costs calculations, automatic retry, and model expanding features. Users can define and describe functions, generate Pydantic BaseModels, hook into completion life cycle, create embeddings, and configure retry and model expanding behavior. Cursive supports various models from OpenAI, Anthropic, OpenRouter, Cohere, and Replicate, with options to pass API keys for authentication.

github

: 170

videodb-python

VideoDB Python SDK allows you to interact with the VideoDB serverless database. Manage videos as intelligent data, not files. It's scalable, cost-efficient & optimized for AI applications and LLM integration. The SDK provides functionalities for uploading videos, viewing videos, streaming specific sections of videos, searching inside a video, searching inside multiple videos in a collection, adding subtitles to a video, generating thumbnails, and more. It also offers features like indexing videos by spoken words, semantic indexing, and future indexing options for scenes, faces, and specific domains like sports. The SDK aims to simplify video management and enhance AI applications with video data.

github

: 65

neural-speed

Neural Speed is an innovative library designed to support the efficient inference of large language models (LLMs) on Intel platforms through the state-of-the-art (SOTA) low-bit quantization powered by Intel Neural Compressor. The work is inspired by llama.cpp and further optimized for Intel platforms with our innovations in NeurIPS' 2023

github

: 327

upgini

Upgini is an intelligent data search engine with a Python library that helps users find and add relevant features to their ML pipeline from various public, community, and premium external data sources. It automates the optimization of connected data sources by generating an optimal set of machine learning features using large language models, GraphNNs, and recurrent neural networks. The tool aims to simplify feature search and enrichment for external data to make it a standard approach in machine learning pipelines. It democratizes access to data sources for the data science community.

github

: 330

datadreamer

DataDreamer is an advanced toolkit designed to facilitate the development of edge AI models by enabling synthetic data generation, knowledge extraction from pre-trained models, and creation of efficient and potent models. It eliminates the need for extensive datasets by generating synthetic datasets, leverages latent knowledge from pre-trained models, and focuses on creating compact models suitable for integration into any device and performance for specialized tasks. The toolkit offers features like prompt generation, image generation, dataset annotation, and tools for training small-scale neural networks for edge deployment. It provides hardware requirements, usage instructions, available models, and limitations to consider while using the library.

github

: 77

bonito

Bonito is an open-source model for conditional task generation, converting unannotated text into task-specific training datasets for instruction tuning. It is a lightweight library built on top of Hugging Face `transformers` and `vllm` libraries. The tool supports various task types such as question answering, paraphrase generation, sentiment analysis, summarization, and more. Users can easily generate synthetic instruction tuning datasets using Bonito for zero-shot task adaptation.

github

: 742

litserve

LitServe is a high-throughput serving engine for deploying AI models at scale. It generates an API endpoint for a model, handles batching, streaming, autoscaling across CPU/GPUs, and more. Built for enterprise scale, it supports every framework like PyTorch, JAX, Tensorflow, and more. LitServe is designed to let users focus on model performance, not the serving boilerplate. It is like PyTorch Lightning for model serving but with broader framework support and scalability.

github

: 53

llama.vim

llama.vim is a plugin that provides local LLM-assisted text completion for Vim users. It offers features such as auto-suggest on cursor movement, manual suggestion toggling, suggestion acceptance with Tab and Shift+Tab, control over text generation time, context configuration, ring context with chunks from open and edited files, and performance stats display. The plugin requires a llama.cpp server instance to be running and supports FIM-compatible models. It aims to be simple, lightweight, and provide high-quality and performant local FIM completions even on consumer-grade hardware.

github

: 1.3k

DeepPavlov

DeepPavlov is an open-source conversational AI library built on PyTorch. It is designed for the development of production-ready chatbots and complex conversational systems, as well as for research in the area of NLP and dialog systems. The library offers a wide range of models for tasks such as Named Entity Recognition, Intent/Sentence Classification, Question Answering, Sentence Similarity/Ranking, Syntactic Parsing, and more. DeepPavlov also provides embeddings like BERT, ELMo, and FastText for various languages, along with AutoML capabilities and integrations with REST API, Socket API, and Amazon AWS.

github

: 6.6k

nano-graphrag

nano-GraphRAG is a simple, easy-to-hack implementation of GraphRAG that provides a smaller, faster, and cleaner version of the official implementation. It is about 800 lines of code, small yet scalable, asynchronous, and fully typed. The tool supports incremental insert, async methods, and various parameters for customization. Users can replace storage components and LLM functions as needed. It also allows for embedding function replacement and comes with pre-defined prompts for entity extraction and community reports. However, some features like covariates and global search implementation differ from the original GraphRAG. Future versions aim to address issues related to data source ID, community description truncation, and add new components.

github

: 2.6k

hume-python-sdk

The Hume AI Python SDK allows users to integrate Hume APIs directly into their Python applications. Users can access complete documentation, quickstart guides, and example notebooks to get started. The SDK is designed to provide support for Hume's expressive communication platform built on scientific research. Users are encouraged to create an account at beta.hume.ai and stay updated on changes through Discord. The SDK may undergo breaking changes to improve tooling and ensure reliable releases in the future.

github

: 79

zml

ZML is a high-performance AI inference stack built for production, using Zig language, MLIR, and Bazel. It allows users to create exciting AI projects, run pre-packaged models like MNIST, TinyLlama, OpenLLama, and Meta Llama, and compile models for accelerator runtimes. Users can also run tests, explore examples, and contribute to the project. ZML is licensed under the Apache 2.0 license.

github

: 2.2k

For similar tasks

aicsimageio

github

: 198

For similar jobs

cellseg_models.pytorch

cellseg-models.pytorch is a Python library built upon PyTorch for 2D cell/nuclei instance segmentation models. It provides multi-task encoder-decoder architectures and post-processing methods for segmenting cell/nuclei instances. The library offers high-level API to define segmentation models, open-source datasets for training, flexibility to modify model components, sliding window inference, multi-GPU inference, benchmarking utilities, regularization techniques, and example notebooks for training and finetuning models with different backbones.

github

: 69

aicsimageio

github

: 198

KG_RAG

KG-RAG (Knowledge Graph-based Retrieval Augmented Generation) is a task agnostic framework that combines the explicit knowledge of a Knowledge Graph (KG) with the implicit knowledge of a Large Language Model (LLM). KG-RAG extracts "prompt-aware context" from a KG, which is defined as the minimal context sufficient enough to respond to the user prompt. This framework empowers a general-purpose LLM by incorporating an optimized domain-specific 'prompt-aware context' from a biomedical KG. KG-RAG is specifically designed for running prompts related to Diseases.

github

: 525

Scientific-LLM-Survey

Scientific Large Language Models (Sci-LLMs) is a repository that collects papers on scientific large language models, focusing on biology and chemistry domains. It includes textual, molecular, protein, and genomic languages, as well as multimodal language. The repository covers various large language models for tasks such as molecule property prediction, interaction prediction, protein sequence representation, protein sequence generation/design, DNA-protein interaction prediction, and RNA prediction. It also provides datasets and benchmarks for evaluating these models. The repository aims to facilitate research and development in the field of scientific language modeling.

github

: 261

biochatter

Generative AI models have shown tremendous usefulness in increasing accessibility and automation of a wide range of tasks. This repository contains the `biochatter` Python package, a generic backend library for the connection of biomedical applications to conversational AI. It aims to provide a common framework for deploying, testing, and evaluating diverse models and auxiliary technologies in the biomedical domain. BioChatter is part of the BioCypher ecosystem, connecting natively to BioCypher knowledge graphs.

github

: 135

ceLLama

ceLLama is a streamlined automation pipeline for cell type annotations using large-language models (LLMs). It operates locally to ensure privacy, provides comprehensive analysis by considering negative genes, offers efficient processing speed, and generates customized reports. Ideal for quick and preliminary cell type checks.

github

: 141

PINNACLE

PINNACLE is a flexible geometric deep learning approach that trains on contextualized protein interaction networks to generate context-aware protein representations. It provides protein representations split across various cell-type contexts from different tissues and organs. The tool can be fine-tuned to study the genomic effects of drugs and nominate promising protein targets and cell-type contexts for further investigation. PINNACLE exemplifies the paradigm of incorporating context-specific effects for studying biological systems, especially the impact of disease and therapeutics.

github

: 70

Taiyi-LLM

Taiyi (太一) is a bilingual large language model fine-tuned for diverse biomedical tasks. It aims to facilitate communication between healthcare professionals and patients, provide medical information, and assist in diagnosis, biomedical knowledge discovery, drug development, and personalized healthcare solutions. The model is based on the Qwen-7B-base model and has been fine-tuned using rich bilingual instruction data. It covers tasks such as question answering, biomedical dialogue, medical report generation, biomedical information extraction, machine translation, title generation, text classification, and text semantic similarity. The project also provides standardized data formats, model training details, model inference guidelines, and overall performance metrics across various BioNLP tasks.

github

: 138