The Data-split YAML format

The YAML format is used for this project to store the data-split information due to human readability and preserving the ordering.

Each data-split file contains the following sections:

  • Splits (train, validation, test, …)

  • Class names in each split (firearm, knife, negative, …)

  • Image names used in each split/class (e.g. BAGGAGE_20180509_135459_126887)

    • Groundtruth filenames used in each class (e.g. “%_label_1”)

    • The “%” character in filename represents the image name the groundtruth belongs to

  • Metadata about the data-split:

    • Image counts and ratios

    • Creation date

    • User and host name data created by

    • Database revision

    • Comment


Here’s an example of YAML data-split file:

split:
  train:
      firearm:
        BAGGAGE_20180509_135459_126887:
        - "%_label_3"
        - "%_label_4"
        BAGGAGE_20181011_052802_126022: "%_label_4"
        BAGGAGE_20180509_140131_126887: "%_label_2"
        ...
      knife:
        BAGGAGE_20180520_152912_126583: "%_label_1"
        BAGGAGE_20180509_140131_126887: "%_label_2"
        BAGGAGE_20180509_135459_126887:
        - "%_label_3"
        - "%_label_4"
        BAGGAGE_20180520_153224_126583: "%_label_1"
        ...
        negative:
        BAGGAGE_20181011_051822_126022:
        BAGGAGE_20181011_052019_126022:
        BAGGAGE_20181011_052324_126022:
        BAGGAGE_20181011_052802_126022:
        BAGGAGE_20181011_052939_126022:
        ...
  test:
      firearm:
        BAGGAGE_20180509_135848_126887: "%_label_1"
        BAGGAGE_20180520_152151_126583: "%_label_2"
        BAGGAGE_20180520_152912_126583: "%_label_3"
        BAGGAGE_20180520_153415_126583:
        - "%_label_1"
        - "%_label_2"
        - "%_label_3"
        BAGGAGE_20180520_153224_126583: "%_label_1"
        ...
      knife:
        BAGGAGE_20180520_152912_126583: "%_label_3"
        BAGGAGE_20180509_135459_126887:
        - "%_label_3"
        - "%_label_1"
        BAGGAGE_20180520_153224_126583: "%_label_1"
        ...
      negative:
        BAGGAGE_20181011_053353_126022:
        BAGGAGE_20181011_054701_126022:
        BAGGAGE_20181011_054922_126022:
        BAGGAGE_20181011_064747_126022:
        BAGGAGE_20181011_063828_126022:
        ...
    val: {}
metadata:
  image_count:
    train: 20246
    test: 5219
    val: 5380
  ratio:
    train: 0.6563786675312044
    test: 0.1692008429242989
    val: 0.17442048954449668
  date_generated: '2021-07-30T16:43:17.697206'
  host_and_user:
  - user1
  - localhost
  db_git_revision: e0021960bc18636230086008af5bee8f7116ae9e
  comment: 'No Comment'

Please see the Sample dataset Yaml file