The Data-split YAML format
The YAML format is used for this project to store the data-split information due to human readability and preserving the ordering.
Each data-split file contains the following sections:
Splits (train, validation, test, …)
Class names in each split (firearm, knife, negative, …)
Image names used in each split/class (e.g. BAGGAGE_20180509_135459_126887)
Groundtruth filenames used in each class (e.g. “%_label_1”)
The “%” character in filename represents the image name the groundtruth belongs to
Metadata about the data-split:
Image counts and ratios
Creation date
User and host name data created by
Database revision
Comment
Here’s an example of YAML data-split file:
split:
train:
firearm:
BAGGAGE_20180509_135459_126887:
- "%_label_3"
- "%_label_4"
BAGGAGE_20181011_052802_126022: "%_label_4"
BAGGAGE_20180509_140131_126887: "%_label_2"
...
knife:
BAGGAGE_20180520_152912_126583: "%_label_1"
BAGGAGE_20180509_140131_126887: "%_label_2"
BAGGAGE_20180509_135459_126887:
- "%_label_3"
- "%_label_4"
BAGGAGE_20180520_153224_126583: "%_label_1"
...
negative:
BAGGAGE_20181011_051822_126022:
BAGGAGE_20181011_052019_126022:
BAGGAGE_20181011_052324_126022:
BAGGAGE_20181011_052802_126022:
BAGGAGE_20181011_052939_126022:
...
test:
firearm:
BAGGAGE_20180509_135848_126887: "%_label_1"
BAGGAGE_20180520_152151_126583: "%_label_2"
BAGGAGE_20180520_152912_126583: "%_label_3"
BAGGAGE_20180520_153415_126583:
- "%_label_1"
- "%_label_2"
- "%_label_3"
BAGGAGE_20180520_153224_126583: "%_label_1"
...
knife:
BAGGAGE_20180520_152912_126583: "%_label_3"
BAGGAGE_20180509_135459_126887:
- "%_label_3"
- "%_label_1"
BAGGAGE_20180520_153224_126583: "%_label_1"
...
negative:
BAGGAGE_20181011_053353_126022:
BAGGAGE_20181011_054701_126022:
BAGGAGE_20181011_054922_126022:
BAGGAGE_20181011_064747_126022:
BAGGAGE_20181011_063828_126022:
...
val: {}
metadata:
image_count:
train: 20246
test: 5219
val: 5380
ratio:
train: 0.6563786675312044
test: 0.1692008429242989
val: 0.17442048954449668
date_generated: '2021-07-30T16:43:17.697206'
host_and_user:
- user1
- localhost
db_git_revision: e0021960bc18636230086008af5bee8f7116ae9e
comment: 'No Comment'
Please see the Sample dataset Yaml file