The HDF5 Format

The multipurpose Hierarchical Data Formats (HDF5) file format developed by the non-profit HDF Group corporation used in this project, due to the the capability of storing large numerical datasets with various types with their metadata. The open-source format is widely supported by APIs and programming languages on multiple platforms.

UVal’s HDF5 Files Format Specifications

For each image filename (NAME), the following files are individually created to:

NAME.det.h5 (contains 3d detection results + file_meta & volume_meta)
NAME.gt.h5 (contains 3d groundtruth objects + file_meta & volume_meta)
NAME.volcache.h5 (contains volume 2d projection cache images + file_meta & volume_meta)
NAME.voldata.h5 (3d volume_data + file_meta & volume_meta)

Data sections stored in HDF5 files

file_meta:  # Always required
- host_name: "H5T_STRING"  # Host name of computer that generated the h5 file
- user_name: "H5T_STRING"  # User name of user that generated the h5 file
- dt_generated: "H5T_STRING"  # ISO 8601 time and date of file creation (with timezone!)
- det_version: "H5T_STRING" # The version of the detection software used to generate files

volume_meta:  # Always available, not optional! (also for dets and gt)
  id: "H5T_STRING"  # e.g. Image filename
  file_md5: "H5T_STRING"  # The checksum of the original ct volume file e.g. 686593fa1f05f610066129b72c62bfdd
  full_shape: INT (3) # Shape of full volume
  is_cropped: INT  # if 1, the data only contain the voxels within the roi
  roi_start: INT (3) # Starting coordinates of the 3d image (x,y,z)
  roi_shape: INT (3) # Matches size of data if "is_cropped" is True
  cache:  # Optional
    projection_x: UINT8 RGB IMAGE  # (RGB) projection of 3d volume in X direction 
    projection_y: UINT8 RGB IMAGE  # (RGB) projection of 3d volume in Y direction
    projection_z: UINT8 RGB IMAGE  # (RGB) projection of 3d volume in Z direction

volume_data: "H5T_STD_U16LE"  # Optional (to save space, not contained in .dets.h5 and .gt.h5)

detections:  # list int-indexed as strings for each member (e.g. "0", "1", ..)
  class_name: "H5T_STRING" # the class name or type of the detection
  roi_start: INT (3) # Starting coordinates of the detection ROI in 3d (x,y,z)
  roi_shape: INT (3) # Same as size of mask if mask is available
  mask: "H5T_STD_U8LE"  # Optional (e.g. only bounding boxes)
  score: DOUBLE (SCALAR "H5T_IEEE_F64LE")
  cache:  # Optional
    projection_x: UINT16 # Taking 3D mask with 1s and 0s, adding up along x axis (only y and z axis remain)
    projection_y: UINT16 # Taking 3D mask with 1s and 0s, adding up along y axis (only x and z axis remain)
    projection_z: UINT16 # Taking 3D mask with 1s and 0s, adding up along z axis (only x and y axis remain)

groundtruth:  # dict indexed by label id for each member
  class_name: "H5T_STRING" # the class name or type of the groundtruth 
  subclass_name: "H5T_STRING"
  target_id: "H5T_STRING" # The id or index (not name) of the class
  roi_start: INT (3)
  roi_shape: INT (3) # Same as size of mask if mask is available
  mask: "H5T_STD_U8LE" # Optional (e.g. only bounding boxes)
  cache:  # Optional
    projection_x:  # As for detections
    projection_y:
    projection_z:

Some remarks:

# Definition of X, Y and Z axis in CT images
* "Z-axis" is belt direction (dir. of motion)
* "Y-axis" is vertical pointing up
* "X-axis" is point left when looking in belt motion direction
 
Character set is always UTF-8

# Definition of the projection:

2D MASK projection to 1D:
X ---->
0 0 0 0 0 0 0  ^
0 1 0 1 0 0 0  |
0 1 1 1 1 0 0  |
0 0 1 1 0 0 0  Y
0 0 0 1 0 0 0
 
In projection the voxel values are adding up:
0 2 2 4 1 0 0