Modules¶
Unagi has three submodules binarize, dataset and train. Each submodule has a main method, which can be used to create a dataset, train the model with the created data and then binarize the image using saved model weights.
unagi.binarize¶
Binarize module uses a pretrained model saved from the model training and performs the model prediction i.e binarization on the given image.
-
unagi.binarize.main(input_path: str = './input', output_path: str = './output', weights_path: Optional[str] = None, batchsize: int = 2) → List[numpy.ndarray][source]¶ Binarize images from input directory and write them to output directory.
Parameters: - input_path (str, optional) – input path for images. (default is input folder in current directory)
- output_path (str, optional) – output path to save images. (default is output folder in current directory)
- weights_path (str or None, optional) – path to weights file. if None default weights will be loaded from package root directory.
- batchsize (int, optional) – batchsize to use in model prediction
Returns: list of binary images in np.ndarray format
Return type: List[numpy.ndarray]
Note
All input image names should be in png format “sample_1.png”. All output image names will end with “_bin” like “sample_1_bin.png”.
Example
unagi.binarize.main(‘input_path’, ‘output_path’, 2)
unagi.dataset¶
Dataset module can be used to create the traindata set. It takes a folder with input image and it’s corrosponding ground truth image. Image name should end with _in and ground truth image name should end with _gt. Input and ground truth images should have same file extention.
Tip
Consider saving the images in PNG format and not JPG. Saving binary images in JPG format would make the image to carry some gray level pixels.
Images are cropped into smaller image parts based on the input size of the U-net model. Output folder will contain two sub folders such as in and gt. in folder contains the input images and gt contains the respective ground images.
-
unagi.dataset.main(input_path: str = './input', output_path: str = './output', shuffle: bool = True, size_x: int = 128, size_y: int = 128, step_x: int = 128, step_y: int = 128, processes: int = 2) → None[source]¶ Create train and ground-truth images suitable for unagi training.
Parameters: - input_path (str, optional) – path to input images (default is os.path.join(“.”, “input”))
- output_path (str, optional) – path to created images (default is os.path.join(“.”, “output”))
- shuffle (bool, optional) – shuffle the newly created images (default is True)
- size_x (int, optional) – width for image part (deafult is 128).
- size_y (int, optional) – height for image part (deafult is 128).
- step_x (int, optional) – width overlay for image part (deafult is 128).
- step_y (int, optional) – height overlay for image part (deafult is 128).
- processes (int, optional) – number of cpu cores to use (default is cpu_count()
Returns: Return type: None
Note
All train image names should end with “_in” like “1_in.png”. All ground-truth image should end with “_gt” like “1_gt.png”. If for some image there is only train or ground-truth version, script fails. After script finishes, in the output directory there will be two subdirectories: “in” with train images and “gt” with ground-truth images.
Example
unagi.dataset.main(in_img_path, out_imgs_path, 128, 128, 128, 128)
-
unagi.dataset.process_img(fname_in: str, size_x: int = 128, size_y: int = 128, step_x: int = 128, step_y: int = 128) → None[source]¶ Read train and ground_truth images, split them and save.
Parameters: - fname_in (str,) – input image name
- size_x (int) – width for image part (deafult is 128).
- size_y (int) – height for image part (deafult is 128).
- step_x (int) – width overlay for image part (deafult is 128).
- step_y (int) – height overlay for image part (deafult is 128).
Returns: Return type: None
See also
Example
unagi.dataset.process_img(img_name, 128, 128, 128, 128)
-
unagi.dataset.save_imgs(imgs_in: List[numpy.ndarray], imgs_gt: List[numpy.ndarray], fname_in: str) → None[source]¶ Save image parts to one folder.
Save all image parts to folder with name ‘(original image name) + _parts’.
Parameters: - imgs_in (List[np.ndarray]) – list of input image arrays
- imgs_gt (List[np.ndarray]) – list of gt image arrays
- fname_in (str) – original full image
Returns: Return type: None
Example
unagi.dataset.save_imgs(in_img_list, gt_img_list, in_img)
-
unagi.dataset.shuffle_imgs(dname: str) → None[source]¶ Shuffle input and ground-truth images.
(actual, if You are using different datasets as one).
Parameters: dname (str) – directory name with image to shuffle Returns: Return type: None Example
unagi.dataset.shuffle_imgs(images_dir)
-
unagi.dataset.split_img_overlay(img: numpy.ndarray, size_x: int = 128, size_y: int = 128, step_x: int = 128, step_y: int = 128) → Tuple[List[numpy.ndarray], int, int][source]¶ Split image to parts (little images) with possible overlay.
Parameters: - img (np.ndarray) – input image array
- size_x (int, optional) – width for image part (deafult is 128).
- size_y (int, optional) – height for image part (deafult is 128).
- step_x (int, optional) – width overlay for image part (deafult is 128).
- step_y (int, optional) – height overlay for image part (deafult is 128).
Returns: list of numpy arrays border value along width border value along height
Return type: Tuple[List[numpy.ndarray], int, int]
Note
Walk through the whole image by the window of size size_x * size_y with step step_x, step_y and save all parts in list. If the image sizes are not multiples of the window sizes, the image will be complemented by a frame of suitable size. If step_x, step_y are not equal to size_x, size_y, parts overlay each other, or have spaces between each other.
Example
unagi.dataset.split_img_overlay(img_name, 128, 128, 128, 128)
unagi.train¶
Train module is used to train the U-net model. Traindata set is split into train, validation and test datasets to use in model training. Best fitting weights are saved for each epoch and the model performance can be visualized by using some images to test the model performance. These images are independent of the training set.
Loss functions can be selected from the available options and the train data is augmented on fly during the training to make the model robust to the distrotions in data.
-
class
unagi.train.ParallelDataGenerator(fnames_in: List[str], fnames_gt: List[str], batch_size: int, augmentate: bool)[source]¶ Bases:
tensorflow.python.keras.utils.data_utils.SequenceGenerate images for training/validation/testing (parallel version).
Parameters: - fnames_in (List[str]) – list of input images
- fnames_gt (List[str]) – list of gt images
- batch_size (int) – batch size to generate augmentations on images
- augmentate (bool) – apply augmentate to batch of images
-
augmentate_batch(imgs_in: List[numpy.ndarray], imgs_gt: List[numpy.ndarray]) → Tuple[List[numpy.ndarray], List[numpy.ndarray]][source]¶ Generate ordered augmented batch of images, using Augmentor.
Parameters: - imgs_in (List[numpy.ndarray]) – list of input images as array
- imgs_gt (List[numpy.ndarray]) – list of gt image as array
Returns: List of input images after applying augmentation List of gt images after applying augmentation
Return type: Tuple[List[numpy.ndarray], List[numpy.ndarray]]
-
unagi.train.main(input_path: str = './input', vis: str = './vis', debug: str = './train_logs', loss: Union[Callable[[Any, Any], float], str] = 'dice_coef_loss', epochs: int = 1, batchsize: int = 32, augmentate: bool = True, train_split: int = 80, val_split: int = 10, test_split: int = 10, weights_path: str = './bin_weights.hdf5', num_gpus: int = 1, extraprocesses: int = 0, queuesize: int = 10)[source]¶ Train U-net with pairs of train and ground-truth images.
Parameters: - input_path (str, optional) – input dir with in and gt sub folders to train (default is os.path.join(“.”, “input”)).
- vis (str, optional) – dir with image to use for train visualization (default is os.path.join(“.”, “vis”)).
- debug (str, optional) – path to save training logs (default is os.path.join(“.”, “train_logs”)).
- loss (str or function, optional) – loss function (default is dice_coef_loss - dice loss).
- epochs (int, optional) – number of epochs to train unagi (default is 1).
- batchsize (int, optional) – batchsize to train unagi (default is 32).
- augmentate (bool, optional) – argumentate the original images for training unagi (default is True)
- train_split (int, optional) – train dataset split percentage (default is 80).
- val_split (int, optional) – validation dataset split percentage (default is 10).
- test_split (int, optional) – train dataset split percentage (default is 10).
- weights_path (str, optional) – path to save final weights (default is os.path.join(“.”, “bin_weights.hdf5”)).
- num_gpus (int, optional) – number of gpus to use for training unagi (default is 1)
- extraprocesses (int, optional) – number of extraprocesses to use (default is 0).
- queuesize (int, optional) – number of batches to generate in queue while training (default is 10).
Returns: Return type: None
Note
All train images should be in “in” directory. All ground-truth images should be in “gt” directory.
Example
unagi.train.main(input, vis, logs_dir, 2, 4)