Modules
Unagi has three submodules binarize, dataset and train. Each submodule has a main method, which can be used to create a dataset, train the model with the created data and then binarize the image using saved model weights.
unagi.binarize
Binarize module uses a pretrained model saved from the model training and performs the model prediction i.e binarization on the given image.
- unagi.binarize.main(input_path: str = './input', output_path: str = './output', weights_path: str | None = None, batchsize: int = 2) List[ndarray][source]
Binarize images from input directory and write them to output directory.
- Parameters:
input_path (str, optional) – input path for images. (default is input folder in current directory)
output_path (str, optional) – output path to save images. (default is output folder in current directory)
weights_path (str or None, optional) – path to weights file. if None default weights will be loaded from package root directory.
batchsize (int, optional) – batchsize to use in model prediction
- Returns:
list of binary images in np.ndarray format
- Return type:
List[numpy.ndarray]
Note
All input image names should be in png format “sample_1.png”. All output image names will end with “_bin” like “sample_1_bin.png”.
Example
unagi.binarize.main(‘input_path’, ‘output_path’, 2)
unagi.dataset
Dataset module can be used to create the traindata set. It takes a folder with input image and it’s corrosponding ground truth image. Image name should end with _in and ground truth image name should end with _gt. Input and ground truth images should have same file extention.
Tip
Consider saving the images in PNG format and not JPG. Saving binary images in JPG format would make the image to carry some gray level pixels.
Images are cropped into smaller image parts based on the input size of the U-net model. Output folder will contain two sub folders such as in and gt. in folder contains the input images and gt contains the respective ground images.
- class unagi.dataset.ImageProcessor(size_x: int = 128, size_y: int = 128, step_x: int = 128, step_y: int = 128)[source]
Bases:
object- process_img(fname_in: str) None[source]
Read train and ground_truth images, split them and save.
- Parameters:
fname_in (str,) – input image name
- Return type:
None
Example
unagi.dataset.ImageProcessor.process_img(img_name, 128, 128, 128, 128)
- save_imgs(imgs_in: List[ndarray], imgs_gt: List[ndarray], fname_in: str) None[source]
Save image parts to one folder.
Save all image parts to folder with name ‘(original image name) + _parts’.
- Parameters:
imgs_in (List[np.ndarray]) – list of input image arrays
imgs_gt (List[np.ndarray]) – list of gt image arrays
fname_in (str) – original full image
- Return type:
None
Example
unagi.dataset.ImageProcessor.save_imgs(in_img_list, gt_img_list, in_img)
- split_img_overlay(img: ndarray) Tuple[List[ndarray], int, int][source]
Split image to parts (little images) with possible overlay.
- Parameters:
img (np.ndarray) – input image array
- Returns:
list of numpy arrays border value along width border value along height
- Return type:
Tuple[List[numpy.ndarray], int, int]
Note
Walk through the whole image by the window of size size_x * size_y with step step_x, step_y and save all parts in list. If the image sizes are not multiples of the window sizes, the image will be complemented by a frame of suitable size. If step_x, step_y are not equal to size_x, size_y, parts overlay each other, or have spaces between each other.
Example
unagi.dataset.ImageProcessor.split_img_overlay(img_name, 128, 128, 128, 128)
- unagi.dataset.main(input_path: str = './input', output_path: str = './output', shuffle: bool = True, size_x: int = 128, size_y: int = 128, step_x: int = 128, step_y: int = 128, processes: int = 4) None[source]
Create train and ground-truth images suitable for unagi training.
- Parameters:
input_path (str, optional) – path to input images (default is os.path.join(“.”, “input”))
output_path (str, optional) – path to created images (default is os.path.join(“.”, “output”))
shuffle (bool, optional) – shuffle the newly created images (default is True)
size_x (int, optional) – width for image part (deafult is 128).
size_y (int, optional) – height for image part (deafult is 128).
step_x (int, optional) – width overlay for image part (deafult is 128).
step_y (int, optional) – height overlay for image part (deafult is 128).
processes (int, optional) – number of cpu cores to use (default is cpu_count()
- Return type:
None
See also
ImageProcessor.process_img,unagi.utils.img_processing_utils.ImageUtils.shuffle_imgs,unagi.utils.img_processing_utils.ImageUtils.mkdir_sNote
All train image names should end with “_in” like “1_in.png”. All ground-truth image should end with “_gt” like “1_gt.png”. If for some image there is only train or ground-truth version, script fails. After script finishes, in the output directory there will be two subdirectories: “in” with train images and “gt” with ground-truth images.
Example
unagi.dataset.main(in_img_path, out_imgs_path, 128, 128, 128, 128)
- unagi.dataset.process_img_wrapper(args: Tuple[ImageProcessor, str]) None[source]
unagi.train
Train module is used to train the U-net model. Train dataset is split into train, validation and test datasets to use in model training. Best fitting weights are saved for each epoch and the model performance can be visualized by using some images to test the model performance. These images are independent of the training set.
Loss functions can be selected from the available options and the train data is augmented on fly during the training to make the model robust to the distortions in data.
- class unagi.train.ParallelDataGenerator(fnames_in: List[str], fnames_gt: List[str], batch_size: int, augmentate: bool)[source]
Bases:
SequenceGenerate images for training/validation/testing (parallel version).
- Parameters:
fnames_in (List[str]) – list of input images
fnames_gt (List[str]) – list of gt images
batch_size (int) – batch size to generate augmentations on images
augmentate (bool) – apply augmentate to batch of images
- augmentate_batch(imgs_in: List[ndarray], imgs_gt: List[ndarray]) Tuple[List[ndarray], List[ndarray]][source]
Generate ordered augmented batch of images, using Augmentor.
- Parameters:
imgs_in (List[numpy.ndarray]) – list of input images as array
imgs_gt (List[numpy.ndarray]) – list of gt image as array
- Returns:
List of input images after applying augmentation List of gt images after applying augmentation
- Return type:
Tuple[List[numpy.ndarray], List[numpy.ndarray]]
- unagi.train.main(input_path: str = './input', vis: str = './vis', debug: str = './train_logs', loss: str = 'dice_coef_loss', epochs: int = 1, batchsize: int = 32, augmentate: bool = True, train_split: int = 80, val_split: int = 10, test_split: int = 10, weights_path: str = './bin_weights.hdf5', num_gpus: int = 1, extraprocesses: int = 0, queuesize: int = 10) None[source]
Train U-net with pairs of train and ground-truth images.
- Parameters:
input_path (str, optional) – input dir with in and gt sub folders to train (default is os.path.join(“.”, “input”)).
vis (str, optional) – dir with image to use for train visualization (default is os.path.join(“.”, “vis”)).
debug (str, optional) – path to save training logs (default is os.path.join(“.”, “train_logs”)).
loss (str, optional) – loss function (default is dice_coef_loss - dice loss).
epochs (int, optional) – number of epochs to train unagi (default is 1).
batchsize (int, optional) – batchsize to train unagi (default is 32).
augmentate (bool, optional) – argumentate the original images for training unagi (default is True)
train_split (int, optional) – train dataset split percentage (default is 80).
val_split (int, optional) – validation dataset split percentage (default is 10).
test_split (int, optional) – train dataset split percentage (default is 10).
weights_path (str, optional) – path to save final weights (default is os.path.join(“.”, “bin_weights.hdf5”)).
num_gpus (int, optional) – number of gpus to use for training unagi (default is 1)
extraprocesses (int, optional) – number of extraprocesses to use (default is 0).
queuesize (int, optional) – number of batches to generate in queue while training (default is 10).
- Return type:
None
Note
All train images should be in “in” directory. All ground-truth images should be in “gt” directory.
Example
unagi.train.main(input, vis, logs_dir, 2, 4)
unagi.cli
Command line interface for the unagi package. It can be used to create dataset, train the model and binarize the image.
$ unagi --help
usage: unagi [-h] [-v] {dataset,train,binarize} ...
command-line interface for Unagi package
optional arguments:
-h, --help show this help message and exit
-v, --version show package version and exit
available commands:
{dataset,train,binarize}
dataset Create dataset to train unagi model
train Train the unagi model
binarize Use the model weights to binarize images