title | tags | authors | affiliations | date | bibliography | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HeuDiConv — flexible DICOM conversion into structured directory layouts |
|
|
|
2023-07-06 |
paper.bib |
In order to support efficient processing, data must be formatted according to standards prevalent in the field, and widely supported among actively developed analysis tools. The Brain Imaging Data Structure (BIDS) [@GAC+16] is an open standard designed for computational accessibility, operator legibility, and a wide and easily extendable scope of modalities — and is consequently used by numerous analysis and processing tools as the preferred input format in many fields of neuroscience. HeuDiConv (Heuristic DICOM Converter) enables flexible and efficient conversion of spatially reconstructed neuroimaging data from the DICOM format (quasi-ubiquitous in biomedical image acquisition systems, particularly in clinical settings) to BIDS, as well as other file layouts. HeuDiConv provides a multi-stage operator input workflow (discovery, manual tuning, conversion) where manual tuning step is optional and thus the entire conversion can be seamlessly integrated into a data processing pipeline. HeuDiConv is written in Python, and supports the DICOM specification for input parsing, and the BIDS specification for output construction. The support for these standards is extensive, and HeuDiConv can handle complex organization scenarios such as arise for specific data types (e.g., multi-echo sequences, or single-band reference volumes). In addition to generating valid BIDS outputs, additional support is offered for custom output layouts. This is obtained via a set of built-in fully functional or example heuristics expressed as simple Python functions. Those heuristics could be taken as a template or as a base for developing custom heuristics, thus providing full flexibility and maintaining user accessibility. HeuDiConv further integrates with DataLad [@datalad], and can automatically prepare hierarchies of DataLad datasets with optional obfuscation of sensitive data and metadata, including obfuscating patient visit timestamps in the git version control system. As a result, given its extensibility, large modality support, and integration with advanced data management technologies, HeuDiConv has become a mainstay in numerous neuroimaging workflows, and constitutes a powerful and highly adaptable tool of potential interest to large swathes of the neuroimaging community.
Neuroimaging is an empirical research area which relies heavily on efficient data acquisition, harmonization, and processing.
Neuroimaging data sourced from medical imaging equipment, and in particular magnetic resonance imaging (MRI) scanners, can be exported in numerous formats, among which DICOM (Digital Imaging and Communications in Medicine) is most prominent.
DICOM data are often transmitted to PACS (Picture Archiving and Communication Systems) servers for archiving or further processing.
Unlike in clinical settings, where data are interfaced with directly from PACS in the DICOM format, in neuroimaging research, tools typically require data files in the NIfTI [@nifticlib] format which directly stores images as 3D or 4D objects and restricts metadata to the most useful attributes.
Tools such as dcm2niix
[@Li_2016] can be used to convert DICOM files into NIfTI files, and can extract metadata fields not covered by the NIfTI header into sidecar .json
files.
However, the scope of such tools is limited, as it does not extend to organizing multiple NIfTI files for different subjects and possibly scanning sessions within a study.
HeuDiConv was created in 2014 to provide flexible tooling so that labs may rapidly and consistently convert collections of DICOM files into collections of NIfTI files in customizable file system hierarchies. As manual file renaming and metadata reorganization is tedious and error prone, automation is preferable, and this is a consistent focus of HeuDiConv.
Since the inception of HeuDiConv in 2014, the BIDS standard [@GAC+16] was established. BIDS standard formalizes data file hierarchies and metadata storage in a fashion which, due to its community-driven nature, is both highly optimized and widely understood by analysis tools. Since then, DICOM conversion to NIfTI files contained within a BIDS hierarchy has emerged as the most frequent use-case for HeuDiConv.
HeuDiConv has been developed to implement logic commonly used across labs (grouping DICOMs, extracting metadata, converting individual sequences, populating standard BIDS files, etc.) while allowing individual groups to customize how files should be organized and named while driving custom decisions through the conventions and desires of those individual groups.
Such decision making is implemented in HeuDiConv heuristics, which are implemented as Python modules following some minimalistic specified interfaces documented in HeuDiConv documentation (https://heudiconv.readthedocs.io/en/latest/heuristics.html).
HeuDiConv, if instructed to operate in BIDS mode (--bids
flag) with a heuristic providing base naming instructions, and helpers to organize the files in the hierarchy defined by the BIDS standard.
It also ensures files are named according to the BIDS specifications, including complex composite recordings such as those associated with multi-echo sequences.
The convertall heuristic is the simplest heuristic which expresses no knowledge or assumptions about anything and can be used as a template to develop new heuristics or to establish initial mapping for manual naming of the sequences in the "manual curation" step.
The studyforrest_phase2 heuristic is a small sample heuristic developed for the StudyForrest [@studyforrest] project, and demonstrates custom conversion into BIDS dataset.
The ReproIn heuristic was initially developed at the Dartmouth Brain Imaging Center (DBIC) to automate data conversion into BIDS for any neuroimaging study performed using the center's facilities. The core principle behind ReproIn is the reduction of operator interaction required to obtain BIDS datasets for acquired data. It is achieved by ensuring that reference MRI sequences on the instrumentation are organized and named in a consistent and flexible way, such that upon usage in any experimental protocol they will encode the information required for fully automatic conversion and repositing of the resulting data.
In case of correct specification and absent operator errors, such as mis-typed subject or session IDs, it can be fully automated, and work is ongoing to make such deployments turnkey. Visit ReproIn project page http://reproin.repronim.org to discover more.
As a citeable resource RRID:SCR_017427, Heudiconv has already 6 mentions in papers at time of writing.
There is a growing number of downloads from PyPI and uses of HeuDiConv (see \autoref{fig:usage}).
Over 40 BIDS datasets were converted over to BIDS with HeuDiConv at Dartmouth Brain Imaging Center (DBIC), using the ReproIn heuristic developed there.
HeuDiConv was found to be used for PET data conversion [@JZC+21:PET], shared as OpenNeuro ds003382 [@openneuro.ds003382.v1.0.0].
Moreover, the HeuDiConv approach inspired the development of fw-heudiconv
(FlywheelTools: Software for HeuDiConv-Style BIDS Curation On Flywheel) [@TCB+21:fw-heudiconv].
HeuDiConv uses specialized tools and libraries:
datalad
[@datalad] (RRID: SCR_003931) enables managing produced datasets as version controlled repositories.dcm2niix
[@Li_2016] is used for the conversion from DICOM to NIfTI and initial versions of sidecar .json files,etelemetry
andfilelock
are used as supplementary utilities,neurodocker
[@zenodo:neurodocker] (RRID:SCR_017426) is used to produceDockerfile
from which docker images are built,nipype
[@nipype] (RRID:SCR_002502) to interfacedcm2niix
and extra metadata invocations,pydicom
[@zenodo:pydicom] (RRID:SCR_002573) anddcmstack
for DICOM analysis and extraction of extra metadata to place to BIDS sidecar files,pytest
formalizes unit and integration testing.
We would like to extend our gratitude to Matthew Brett, Jörg Stadler, Russell Poldrack, Sin Kim, Dan Lurie, and Henry Braun for notable contributions to the codebase, bug reports, recommendations, and promotion of HeuDiConv.
HeuDiConv development was primarily done under the umbrella of the NIH funded Nipype 1R01EB020740-01, ReproNim 1P41EB019936-01A1 and 2P41EB019936-06A1 (PI: Kennedy).