Bug: processing non-standard DICOM filenames
When processing files that don't get sorted like X-Rays, they can have unusual (non-standard filenames).
In the Sonador repository in apisettings.py line 25 the following DICOM patterns are used to identify standard DICOMs. This approach routinely misses the files with non-standard file format.
DCM_EXTENSIONS_DEFAULT = ['*.dcm', '*.DCM', '*.DICOM', '*.dicom', 'IM*']
This omits files such as: 54F6EBAE, 98018080, IN000009, IM-0001-0009.dcm (this one uploads twice due to satisfying both IM* and *.dcm criteria)
file extensions to exclude can be found in the attached file. This came from a study of how to extract only the important files from a directory (7zip)
Known issues identifying and loading DICOM files:
- PACS system produces a copy of a Series with identical header information, but thumbnails for pixel data
-
Need to locate test data
-
- DICOM files are not v3 compatible and do not have the preamble ("DICM")
- Files will fail to open using Pydicom's dcmread without the "force" flag
-
Need to locate test data
- Filenames can exceed filename limits when they consist of UIDs
-
Need to locate test data
-
- Sort is 0 based indexed (e.g. IM0, IM1,...) versus 1 based indexed (e.g. IM1, IM2...)
-
Maintain 0 or 1 indexing depending on the input data and provide a flag to over-ride the configuration setting
-
A good solution will support
-
The use of the DICOMDIR file to help locate series -
Provide an option for splitting studies into individual series in individual folders