Creating a dataset for a classification or segmentation task. If an annotation file is present, the annotations are also prepared. The dataset is created based on an imageset.
Imagesets are collected images to build a data-set from, stored in the imagesets
folder.
The imagesets
folder contains the following folder structure:
- imagesets/[imageset_type]/[imageset_name]
Inside the [imageset_name]
folder are the following files / folders
test/
: test images (benchmark)trainval/
: training and validation images for cross validationcategories.txt
: all categories (classes) the imageset contains
Datasets are stored in the datasets
base folder.
The datasets
folder contains the following folder structure:
- datasets/[dataset_type]/[dataset_name]
where
[dataset_type]
is the same as the corresponding[imageset_type]
and[dataset_name]
is the same as the corresponding[imageset_name]
.
Inside the [dataset_name]
folder are the following files / folders
test/
: test set (benchmark)train/
: training setval/
: validation setcategories.txt
: all categories (classes) the dataset contains
To build a data-set from an image-set. Handles currently classification and segmentation image-sets taken from the image-set-type, which is the parent folder, the image-set folder is located in.
To run the data-set builder from command line, use the following command:
python -m mlcore.dataset [parameters]
The following parameters are supported:
[categories]
: The path to the categories file. (e.g.: imagesets/segmentation/car_damage/categories.txt)--annotation
: The path to the image-set annotation file, the data-set is build from. (e.g.: imagesets/classification/car_damage/annotations.csv for classification, imagesets/segmentation/car_damage/via_region_data.json for segmentation)--split
: The percentage of the data which belongs to validation set, default to 0.2 (=20%)--seed
: A random seed to reproduce splits, default to None--category-label-key
: The key, the category name can be found in the annotation file, default to category.--sample
: The percentage of the data which will be copied as a sample set with in a separate folder with "_sample" suffix. If not set, no sample data-set will be created.--type
: The type of the data-set, if not explicitly set try to infer from categories file path.--tfrecord
: Also create .tfrecord files.--join-overlapping-regions
: Whether overlapping regions of same category should be joined.--annotation-area-thresh
: Keep only annotations with minimum size (width or height) related to image size.--output
: The path of the dataset folder, default to ../datasets.--name
: The name of the data-set, if not explicitly set try to infer from categories file path.