Multi Scale Training in Neural network

4 min readJun 28, 2020

Most of the Machine learning algorithm mimics the human behaviour. And this gets better and better when they can implement each and every minute behaviours of humans.

One such good example is Convolutional Neural Network which mimics our eyes. CNN algorithms has lot of power to detect object and many of its features in an image or a video frames. Its also has wonderful power to detect different textures and billions of colors. So now its time to take it to a next level.

If we carefully observe working of our eyes, it has ability to focus light with the help of ciliary muscles. So inorder to have this features in our camera we have focusable lens which could be focused manually or could be done by the software itself. But once the image is captured and it is fed into our machine learning algorithm there are very less possibility to get this feature. And this bottle necks the accuracy of our algorithm. So inorder to overcome this problem we use a technique called as Multi Scale Training.

How does this works?

In normal CNN algorithm set of image data of various sizes is resized into a fixed dimension and feed to the model for training. And also when preprocessing techniques like blurring is used it leads to loss of many fine information in it. Which can lead to low accuracy or even leads to miss-classification.

So in multi scale learning each and every image data is resized into range of dimensions (for example a 64*64 image is resized into 128*128, 200*200, 364*364,…). And from these set of resized image a image of fixed size is cropped from each of them. If there are many important features in a single image then even more than one images focusing the required features are cropped. So these new set of images which represents fine details of our target object acts as new enhanced dataset which ensures even a minute details required for our prediction is not missed out. Is also helps to increase the number of data sets.

Now let us see how this can be done in python !

Lets import the libraries

import tensorflow as tf
import os
import cv2 as cv
import random
import numpy as np

Now let us load the data set, here we shall be using a simple cat vs dog dataset

train_dir = os.path.join(PATH, 'train')train_cats_dir = os.path.join(train_dir, 'cats')  # directory with our training cat pictures
train_dogs_dir = os.path.join(train_dir, 'dogs')  # directory with our training dog picturesnum_cats_tr = len(os.listdir(train_cats_dir))
num_dogs_tr = len(os.listdir(train_dogs_dir))

Now inorder to compare the accuracy of model trained with multi scaled data and normal data let us take both the data sets and them to network with same architecture.

# these are the few random sizes
sizeAr=[264,364,464,564]# creating multi scaled data
for i in os.listdir(train_cats_dir)[:20]:
    size=sizeAr[random.randrange(1,100)%4]
    start=size//2-112
    end=size//2+112
    temp=cv.imread(os.path.join(train_cats_dir,i))[start:end,start:end]
    cv.imwrite("multi/cats/"+i,temp)
    
for i in os.listdir(train_dogs_dir)[:20]:
    size=sizeAr[random.randrange(1,100)%4]
    start=size//2-112
    end=size//2+112
    temp=cv.imread(os.path.join(train_dogs_dir,i))[start:end,start:end]
    cv.imwrite("multi/dogs/"+i,temp)
    # simply saving non scaled data
for i in os.listdir(train_cats_dir)[:20]:
    temp=cv.imread(os.path.join(train_cats_dir,i))
    temp=cv.resize(temp,(224,224))
    cv.imwrite("normal/cats/"+i,temp)
    
for i in os.listdir(train_cats_dir)[:20]:
    temp=cv.imread(os.path.join(train_cats_dir,i))
    temp=cv.resize(temp,(224,224))
    cv.imwrite("normal/dogs/"+i,temp)

Let us use VGG16 network for classification

from keras.applications.vgg16 import preprocess_input
from keras.preprocessing.image import ImageDataGeneratortrain_datagen=ImageDataGenerator(preprocessing_function=preprocess_input)train_generatorMulti = train_datagen.flow_from_directory(
    "path for multi scaled data",
    target_size=(224, 224), # resize all images to 224 x 224
    batch_size=5,
    class_mode='binary')
train_generatorNormal = train_datagen.flow_from_directory(
    "path for non multi scaled data",
    target_size=(224, 224), # resize all images to 224 x 224
    batch_size=5,
    class_mode='binary')batch_size = 5
epochs = 5
IMG_HEIGHT = 224
IMG_WIDTH = 224

Now lets us build a sequential model using VGG16

from keras import layers, models, optimizers
from keras.applications import VGG16conv_base = VGG16(weights='imagenet',
                  include_top=True,
                  input_shape=(224, 224, 3))conv_base.summary()conv_base = VGG16(weights='imagenet',
                  include_top=False,
                  input_shape=(IMG_WIDTH,IMG_HEIGHT , 3))model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))model.compile(loss='binary_crossentropy',
              optimizer=optimizers.RMSprop(lr=2e-5),
              metrics=['acc'])

Now let us fit the multi scaled data to our model

history = model.fit_generator(
    train_generatorMulti,
    steps_per_epoch=5,
    epochs=epochs
)

The accuracy and loss of the model trained with multi scaling is as follows

Accuracy of model trained with multi scaling method

Now let us fit data without multi scaling

history = model.fit_generator(
    train_generatorNormal,
    steps_per_epoch=5,
    epochs=epochs
)

The accuracy and loss of the model trained with non-multi scaling is as follows

Accuracy of model trained with non-multi scaling method

So from this clearly we can see that multi scaled model has out performed over the conventional one.

This method is already been used many algorithm like SNIPER and YOLO Objetc detection.

Thanks & regards,

Manthan M Kulakarni

Multi Scale Training in Neural network

How does this works?

Now let us see how this can be done in python !

Written by Manthan M Kulakarni

No responses yet