Creating a dataset for a classification or segmentation task. If an annotation file is present, the annotations are also prepared. The dataset is created based on an imageset.
Imagesets are collected images to build a data-set from, stored in the imagesets folder.
The imagesets folder contains the following folder structure:
- imagesets/[imageset_type]/[imageset_name]
 
Inside the [imageset_name] folder are the following files / folders
test/: test images (benchmark)trainval/: training and validation images for cross validationcategories.txt: all categories (classes) the imageset contains
Datasets are stored in the datasets base folder.
The datasets folder contains the following folder structure:
- datasets/[dataset_type]/[dataset_name]
where 
[dataset_type]is the same as the corresponding[imageset_type]and[dataset_name]is the same as the corresponding[imageset_name]. 
Inside the [dataset_name] folder are the following files / folders
test/: test set (benchmark)train/: training setval/: validation setcategories.txt: all categories (classes) the dataset contains
To build a data-set from an image-set. Handles currently classification and segmentation image-sets taken from the image-set-type, which is the parent folder, the image-set folder is located in.
To run the data-set builder from command line, use the following command:
python -m mlcore.dataset [parameters]
The following parameters are supported:
[categories]: The path to the categories file. (e.g.: imagesets/segmentation/car_damage/categories.txt)--annotation: The path to the image-set annotation file, the data-set is build from. (e.g.: imagesets/classification/car_damage/annotations.csv for classification, imagesets/segmentation/car_damage/via_region_data.json for segmentation)--split: The percentage of the data which belongs to validation set, default to 0.2 (=20%)--seed: A random seed to reproduce splits, default to None--category-label-key: The key, the category name can be found in the annotation file, default to category.--sample: The percentage of the data which will be copied as a sample set with in a separate folder with "_sample" suffix. If not set, no sample data-set will be created.--type: The type of the data-set, if not explicitly set try to infer from categories file path.--tfrecord: Also create .tfrecord files.--join-overlapping-regions: Whether overlapping regions of same category should be joined.--annotation-area-thresh: Keep only annotations with minimum size (width or height) related to image size.--output: The path of the dataset folder, default to ../datasets.--name: The name of the data-set, if not explicitly set try to infer from categories file path.