Skip to content
This repository has been archived by the owner on Jul 17, 2023. It is now read-only.

Latest commit

 

History

History
240 lines (208 loc) · 16.8 KB

paper.md

File metadata and controls

240 lines (208 loc) · 16.8 KB
title tags authors affiliations date bibliography
HeuDiConv — flexible DICOM conversion into structured directory layouts
Python
neuroscience
standardization
DICOM
BIDS
open science
FOSS
name orcid affiliation
Yaroslav O. Halchenko
0000-0003-3456-2493
1
name orcid affiliation
Mathias Goncalves
0000-0002-7252-7771
2
name orcid affiliation
Satrajit Ghosh
0000-0002-5312-6729
3
name orcid affiliation
Pablo Velasco
0000-0002-5749-6049
4
name orcid affiliation
Matteo Visconti di Oleggio Castello
0000-0001-7931-5272
5
name orcid affiliation
Taylor Salo
0000-0001-9813-3167
6
name orcid affiliation
John T. Wodder II
1
name orcid affiliation
Michael Hanke
0000-0001-6398-6370
7, 8
name orcid affiliation
Patrick Sadil
0000-0003-4141-1343
22
name orcid
Krzysztof Jacek Gorgolewski
0000-0003-3321-7583
name orcid affiliation
Horea-Ioan Ioanas
0000-0001-7037-2449
1
name orcid affiliation
Chris Rorden
0000-0002-7554-6142
9
name orcid affiliation
Timothy J. Hendrickson
0000-0001-6862-6526
10, 11
name orcid affiliation
Michael Dayan
0000-0002-2666-0969
12
name orcid affiliation
Sean Dae Houlihan
0000-0001-5003-9278
1, 13
name orcid affiliation
James Kent
0000-0002-4892-2659
14
name orcid affiliation
Ted Strauss
0000-0002-1927-666X
15
name orcid affiliation
John Lee
0000-0001-5884-4247
16
name orcid affiliation
Isaac To
0000-0002-4740-0824
1
name orcid affiliation
Christopher J. Markiewicz
0000-0002-6533-164X
2
name orcid affiliation
Darren Lukas
0009-0003-6941-0833
17
name orcid affiliation
Ellyn Butler
0000-0001-6316-6444
23
name orcid affiliation
Todd Thompson
13
name orcid affiliation
Maite Termenon
0000-0001-8102-5135
18, 19
name orcid affiliation
David V. Smith
0000-0001-5754-9633
20
name orcid affiliation
Austin Macdonald
0000-0002-8124-807X
1
name orcid affiliation
David N. Kennedy
0000-0002-9377-0797
21
name index
Center for Open Neuroscience, Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
1
name index
Department of Psychology, Stanford University, CA, USA
2
name index
McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA, USA
3
name index
Flywheel Exchange LLC, Minneapolis, MN, USA
4
name index
University of California, Berkeley, Berkeley, CA, USA
5
name index
Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
6
name index
Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany
7
name index
Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
8
name index
Department of Psychology, University of South Carolina, Columbia, SC, USA
9
name index
Masonic Institute for the Developing Brain, University of Minnesota, Minneapolis, MN, USA
10
name index
Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, USA
11
name index
Human Neuroscience Platform, Fondation Campus Biotech Geneva, Geneva, Switzerland
12
name index
Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
13
name index
Department of Psychology, University of Texas at Austin, Austin, TX, USA
14
name index
McConnell Brain Imaging Centre, McGill University, Montreal, QC, Canada
15
name index
Data Science and Sharing Team, National Institute of Mental Health, Bethesda, MD, USA
16
name index
Institute for Glycomics, Griffith University, QLD, Australia
17
name index
Biomedical Engineering Department, Faculty of Engineering, Mondragon University, Mondragon, Spain
18
name index
BCBL, Basque center on Cognition, Brain and Language, San Sebastian, Spain
19
name index
Department of Psychology and Neuroscience, Temple University, Philadelphia, PA, USA
20
name index
Departments of Psychiatry and Radiology, University of Massachusetts Chan Medical School, Worcester, MA, USA
21
name index
Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
22
name index
Department of Psychology, Northwestern University, Evanston, IL, USA
23
2023-07-06
paper.bib

Summary

In order to support efficient processing, data must be formatted according to standards prevalent in the field, and widely supported among actively developed analysis tools. The Brain Imaging Data Structure (BIDS) [@GAC+16] is an open standard designed for computational accessibility, operator legibility, and a wide and easily extendable scope of modalities — and is consequently used by numerous analysis and processing tools as the preferred input format in many fields of neuroscience. HeuDiConv (Heuristic DICOM Converter) enables flexible and efficient conversion of spatially reconstructed neuroimaging data from the DICOM format (quasi-ubiquitous in biomedical image acquisition systems, particularly in clinical settings) to BIDS, as well as other file layouts. HeuDiConv provides a multi-stage operator input workflow (discovery, manual tuning, conversion) where manual tuning step is optional and thus the entire conversion can be seamlessly integrated into a data processing pipeline. HeuDiConv is written in Python, and supports the DICOM specification for input parsing, and the BIDS specification for output construction. The support for these standards is extensive, and HeuDiConv can handle complex organization scenarios such as arise for specific data types (e.g., multi-echo sequences, or single-band reference volumes). In addition to generating valid BIDS outputs, additional support is offered for custom output layouts. This is obtained via a set of built-in fully functional or example heuristics expressed as simple Python functions. Those heuristics could be taken as a template or as a base for developing custom heuristics, thus providing full flexibility and maintaining user accessibility. HeuDiConv further integrates with DataLad [@datalad], and can automatically prepare hierarchies of DataLad datasets with optional obfuscation of sensitive data and metadata, including obfuscating patient visit timestamps in the git version control system. As a result, given its extensibility, large modality support, and integration with advanced data management technologies, HeuDiConv has become a mainstay in numerous neuroimaging workflows, and constitutes a powerful and highly adaptable tool of potential interest to large swathes of the neuroimaging community.

Statement of Need

Neuroimaging is an empirical research area which relies heavily on efficient data acquisition, harmonization, and processing. Neuroimaging data sourced from medical imaging equipment, and in particular magnetic resonance imaging (MRI) scanners, can be exported in numerous formats, among which DICOM (Digital Imaging and Communications in Medicine) is most prominent. DICOM data are often transmitted to PACS (Picture Archiving and Communication Systems) servers for archiving or further processing. Unlike in clinical settings, where data are interfaced with directly from PACS in the DICOM format, in neuroimaging research, tools typically require data files in the NIfTI [@nifticlib] format which directly stores images as 3D or 4D objects and restricts metadata to the most useful attributes. Tools such as dcm2niix [@Li_2016] can be used to convert DICOM files into NIfTI files, and can extract metadata fields not covered by the NIfTI header into sidecar .json files. However, the scope of such tools is limited, as it does not extend to organizing multiple NIfTI files for different subjects and possibly scanning sessions within a study.

HeuDiConv was created in 2014 to provide flexible tooling so that labs may rapidly and consistently convert collections of DICOM files into collections of NIfTI files in customizable file system hierarchies. As manual file renaming and metadata reorganization is tedious and error prone, automation is preferable, and this is a consistent focus of HeuDiConv.

Since the inception of HeuDiConv in 2014, the BIDS standard [@GAC+16] was established. BIDS standard formalizes data file hierarchies and metadata storage in a fashion which, due to its community-driven nature, is both highly optimized and widely understood by analysis tools. Since then, DICOM conversion to NIfTI files contained within a BIDS hierarchy has emerged as the most frequent use-case for HeuDiConv.

Overview of HeuDiConv functionality

HeuDiConv has been developed to implement logic commonly used across labs (grouping DICOMs, extracting metadata, converting individual sequences, populating standard BIDS files, etc.) while allowing individual groups to customize how files should be organized and named while driving custom decisions through the conventions and desires of those individual groups. Such decision making is implemented in HeuDiConv heuristics, which are implemented as Python modules following some minimalistic specified interfaces documented in HeuDiConv documentation (https://heudiconv.readthedocs.io/en/latest/heuristics.html). HeuDiConv, if instructed to operate in BIDS mode (--bids flag) with a heuristic providing base naming instructions, and helpers to organize the files in the hierarchy defined by the BIDS standard. It also ensures files are named according to the BIDS specifications, including complex composite recordings such as those associated with multi-echo sequences.

HeuDiConv automates the keystone conversion step in reproducible data handling, without compromising operator flexibility. The showcased set-up depicts a 2-machine infrastructure, with heudiconv operating on the same machine as subsequent analysis steps for data in a standardized and shareable representation. For more advanced usage at institutions with dedicated infrastructure, HeuDiConv can operate on an additional third machine, interfacing between the depicted two, and dedicated to data repositing, versioning, and backup.

Exemplar heuristics

Convertall

The convertall heuristic is the simplest heuristic which expresses no knowledge or assumptions about anything and can be used as a template to develop new heuristics or to establish initial mapping for manual naming of the sequences in the "manual curation" step.

StudyForrest phase 2

The studyforrest_phase2 heuristic is a small sample heuristic developed for the StudyForrest [@studyforrest] project, and demonstrates custom conversion into BIDS dataset.

ReproIn

The ReproIn heuristic was initially developed at the Dartmouth Brain Imaging Center (DBIC) to automate data conversion into BIDS for any neuroimaging study performed using the center's facilities. The core principle behind ReproIn is the reduction of operator interaction required to obtain BIDS datasets for acquired data. It is achieved by ensuring that reference MRI sequences on the instrumentation are organized and named in a consistent and flexible way, such that upon usage in any experimental protocol they will encode the information required for fully automatic conversion and repositing of the resulting data.

In case of correct specification and absent operator errors, such as mis-typed subject or session IDs, it can be fully automated, and work is ongoing to make such deployments turnkey. Visit ReproIn project page http://reproin.repronim.org to discover more.

Adoption and usage

As a citeable resource RRID:SCR_017427, Heudiconv has already 6 mentions in papers at time of writing. There is a growing number of downloads from PyPI and uses of HeuDiConv (see \autoref{fig:usage}). Over 40 BIDS datasets were converted over to BIDS with HeuDiConv at Dartmouth Brain Imaging Center (DBIC), using the ReproIn heuristic developed there. HeuDiConv was found to be used for PET data conversion [@JZC+21:PET], shared as OpenNeuro ds003382 [@openneuro.ds003382.v1.0.0]. Moreover, the HeuDiConv approach inspired the development of fw-heudiconv (FlywheelTools: Software for HeuDiConv-Style BIDS Curation On Flywheel) [@TCB+21:fw-heudiconv].

\label{fig:usage}Downloads experienced an initial sharp rise after the ReproNim HeuDiconv training event at OHBM in mid 2018, and have subsequently followed a positive trend along with the usage — exceeding 1000 sessions per week — in the data collection interval. Depicted are weekly download and confirmed session estimates, averaged per month, with a 95% confidence interval. User session estimates for July and August 2022 are linearly extrapolated from the nearest neighbour. Download counts are sourced from PyPI, the Python community repository; whereas user session counts are sourced from Etelemetry, an infrastructure for verifiable research impact, which end-users can disable to protect privacy.

External dependencies

HeuDiConv uses specialized tools and libraries:

  • datalad [@datalad] (RRID: SCR_003931) enables managing produced datasets as version controlled repositories.
  • dcm2niix [@Li_2016] is used for the conversion from DICOM to NIfTI and initial versions of sidecar .json files,
  • etelemetry and filelock are used as supplementary utilities,
  • neurodocker [@zenodo:neurodocker] (RRID:SCR_017426) is used to produce Dockerfile from which docker images are built,
  • nipype [@nipype] (RRID:SCR_002502) to interface dcm2niix and extra metadata invocations,
  • pydicom [@zenodo:pydicom] (RRID:SCR_002573) and dcmstack for DICOM analysis and extraction of extra metadata to place to BIDS sidecar files,
  • pytest formalizes unit and integration testing.

Acknowledgments

We would like to extend our gratitude to Matthew Brett, Jörg Stadler, Russell Poldrack, Sin Kim, Dan Lurie, and Henry Braun for notable contributions to the codebase, bug reports, recommendations, and promotion of HeuDiConv.

HeuDiConv development was primarily done under the umbrella of the NIH funded Nipype 1R01EB020740-01, ReproNim 1P41EB019936-01A1 and 2P41EB019936-06A1 (PI: Kennedy).

References