Creating a dataset for a classification or segmentation task. If an annotation file is present, the annotations are also prepared. The dataset is created based on an imageset.

Imageset

Imagesets are collected images to build a data-set from, stored in the imagesets folder. The imagesets folder contains the following folder structure:

imagesets/[imageset_type]/[imageset_name]

Inside the [imageset_name] folder are the following files / folders

test/: test images (benchmark)
trainval/: training and validation images for cross validation
categories.txt: all categories (classes) the imageset contains

Dataset Folders

Datasets are stored in the datasets base folder. The datasets folder contains the following folder structure:

datasets/[dataset_type]/[dataset_name] where [dataset_type] is the same as the corresponding [imageset_type] and [dataset_name] is the same as the corresponding [imageset_name].

Inside the [dataset_name] folder are the following files / folders

test/: test set (benchmark)
train/: training set
val/: validation set
categories.txt: all categories (classes) the dataset contains

Helper Methods

Build a data-set

To build a data-set from an image-set. Handles currently classification and segmentation image-sets taken from the image-set-type, which is the parent folder, the image-set folder is located in.

Run from command line

To run the data-set builder from command line, use the following command: python -m mlcore.dataset [parameters]

The following parameters are supported:

[categories]: The path to the categories file. (e.g.: imagesets/segmentation/car_damage/categories.txt)
--annotation: The path to the image-set annotation file, the data-set is build from. (e.g.: imagesets/classification/car_damage/annotations.csv for classification, imagesets/segmentation/car_damage/via_region_data.json for segmentation)
--split: The percentage of the data which belongs to validation set, default to 0.2 (=20%)
--seed: A random seed to reproduce splits, default to None
--category-label-key: The key, the category name can be found in the annotation file, default to category.
--sample: The percentage of the data which will be copied as a sample set with in a separate folder with "_sample" suffix. If not set, no sample data-set will be created.
--type: The type of the data-set, if not explicitly set try to infer from categories file path.
--tfrecord: Also create .tfrecord files.
--join-overlapping-regions: Whether overlapping regions of same category should be joined.
--annotation-area-thresh: Keep only annotations with minimum size (width or height) related to image size.
--output: The path of the dataset folder, default to ../datasets.
--name: The name of the data-set, if not explicitly set try to infer from categories file path.

Dataset Generator

Imageset

Dataset Folders

Helper Methods

`configure_logging`[source]

Build a data-set

`generate`[source]

Run from command line

Dataset Generator

Imageset

Dataset Folders

Helper Methods

configure_logging[source]

Build a data-set

generate[source]

Run from command line

`configure_logging`[source]

`generate`[source]