# <center>CS568:Deep Learning</center>  <center>Spring 2020</center> 

In this notebook, we will see the basic building blocks of keras. 

1. **Load dataset** 
    + difference between **flow**, **flow_from_directory** and **flow_from_dataframe**
    
2. **Define model**
    + difference between **sequential** and **functional** models 
    
3. **Compile model**

4. **Fit model**
    + difference between **fit** and **fit_generator** functions
    
5. **Evaluate model** 
    + difference between **evaluate** and **evaluate_generator** functions
    
6. **Predict model**
    + difference between **predict**, **predict_classes** and **predict_generator** functions

## 1: Load Datasets

Download and extract your dataset in a folder. Make sure your dataset folder has three subfolders train, val and test.



In [0]:
train_data_dir = '/dataset/dogs-vs-cats/train/'
validation_data_dir = '/keras/dataset/dogs-vs-cats/val/'
test_dir = '/keras/dataset/dogs-vs-cats/test/'

Now, import ImageDataGenerator

In [0]:
from keras.preprocessing.image import ImageDataGenerator

Keras ImageDataGenerator is used to 
+ take a batch of images from disk or memory.
+ apply random transformations to each image in the batch.
+ replace the original batch of images with a new randomly transformed batch.
+ train a deep learning model on this transformed batch.

For more information about ImageDataGenerator, see this [link](https://keras.io/preprocessing/image/). 

In [0]:
# create an instance of the ImageDataGenerator class
data_generator = ImageDataGenerator(
                        featurewise_center=False, 
                        featurewise_std_normalization=False, 
                        rotation_range=10,
                        width_shift_range=0.1,
                        height_shift_range=0.1,
                        zoom_range=.1,
                        horizontal_flip=True)

**Example of data augmentation**

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
from keras.preprocessing.image import ImageDataGenerator
from keras.preprocessing.image import img_to_array, load_img
import numpy as np


src_path = '/content/drive/My Drive/CS568-DeepLearning-(Spring2020)/dataset/image/hinton.jpg'
des_path = '/content/drive/My Drive/CS568-DeepLearning-(Spring2020)/dataset/augmented_images/'


image = load_img(src_path)
image = img_to_array(image)
image = np.expand_dims(image, axis=0)


# define ImageDataGenerator class
data_augmentation = ImageDataGenerator(
 rotation_range=90,
 zoom_range=0.5,
 width_shift_range=0.2,
 height_shift_range=0.2,
 shear_range=0.15,
 horizontal_flip=True,
 fill_mode="nearest")

# apply ImageDataGenerator to input image
img_generator = data_augmentation.flow(image, batch_size=1, save_to_dir=des_path, save_prefix="image", save_format="jpg")


nb_image = 10
count = 0
for e in img_generator:
    if (count == nb_image):
        break
    count += 1
    
# check augmented images in the des_path folder

Once constructed, an iterator can be created for an image dataset.

Keras ImageDataGenerator class provides three different types of iterators to load the image dataset:

+ **flow()**:  an iterator that loads the complete image dataset in memory and generates batches of augmented images on each iteration.
+ **flow_from_directory()**: an iterator that loads a batch of images in memory from disk
+ **flow_from_dataframe()**: an iterator that loads a batch of images in memory from disk

The method **flow_from_directory()** assumes:

+ the root directory contains **three (3) folders** for **train**, **val**.and **test**.
+ the train folder should contain **k (number of classes)** sub-directories each containing images of respective classes.
+ the test folder should contain a single folder, which stores all **test images**.


In [0]:
# define ImageDataGenerator class 
train_datagen = ImageDataGenerator(
        rescale=1 / 255.0,
        rotation_range=20,
        zoom_range=0.05,
        width_shift_range=0.05,
        height_shift_range=0.05,
        shear_range=0.05,
        horizontal_flip=True,
        fill_mode="nearest")

# define ImageDataGenerator class for testing images
test_datagen = ImageDataGenerator(rescale=1 /255.0)

# automatically retrieve images and their classes for train and validation sets
batch_size = 4
train_generator = train_datagen.flow_from_directory(
    directory=train_path, # directory must be set to the path where your ‘k’ classes of folders are present.
    target_size=(100, 100), # target_size is the size of your input images, every image will be resized to this size.
    color_mode="rgb", # grayscale or rgb
    batch_size=batch_size, # batch size
    class_mode="binary", # Set “binary” for two classes, set “categorical” for k classes and for regression task set “input”    
    shuffle=True, # want to shuffle images or not
    seed=42 # random seed for applying random image augmentation and shuffling the order of the image.
)

Found 90 images belonging to 2 classes.


In [0]:
# for k number of classes
train_generator = train_datagen.flow_from_directory(
    directory=train_path, 
    target_size=(100, 100), 
    color_mode="rgb", 
    batch_size=batch_size, 
    class_mode="categorical",
    shuffle=True, 
    seed=42 
)

In [0]:
# for specific classes
train_generator = train_datagen.flow_from_directory(
    directory=train_path, 
    target_size=(100, 100), 
    color_mode="rgb", 
    batch_size=batch_size, 
    classes=['class1','class2']
    class_mode="categorical",
    shuffle=True, 
    seed=42 
)

The method **flow_from_dataframe()** is useful when the images of different classes are reside in one folder.

In this case, the text file contains information about class labels. 
We make dataframe using pandas (library) and text file to classify images.

## 2. Define Model

There are two ways to build Keras models: sequential and functional. 

**Sequential Model**

The sequential API allows creating models layer-by-layer for most problems. It does not allow to create models that share layers or have multiple inputs or outputs.

In [0]:
from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(2, input_dim=1))
#model.add(Dense())
model.add(Dense(1))

**Functional Model**

The functional API allows creating models where layers connect to more than just the previous and next layers. You can connect layers to any other layer. 

In [0]:
from keras.models import Model
from keras.layers import Input
from keras.layers import Dense

# Define the input
input_layer = Input(shape=(2,))  
# Connecting layers
hidden_layer = Dense(2)(input_layer)  
# Create the model
model = Model(inputs=input_layer, outputs=hidden_layer)

## 3. Compile Model

This step will create a Python object which builds the CNN. This is done by building the computation graph on the Keras backend.

In this step, we define loss function and type of optimizer.

In [0]:
# compile the model
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])

# Now we have a Python object that has model and all its parameters with its initial values.

## 4. Fit Model

In this step we train the model so that the parameters get tuned to provide the correct outputs for a given input. This can be done by feeding the inputs at the input layer and then getting an output, calculate the loss function using the output and then use backpropagation to tune the model parameters. This step will **fit** the model parameters to the data.

There are two ways to fit the Keras model
#### 1. **fit()**

In [0]:
model.fit(train_X, train_Y, batch_size=32, epochs=50) # pass complete training data (train_X, train_Y) at once in the fit function. 

This function is suitable for small datasets only. However, real-world data sets are usually too large to fit in memory. 

#### 2. **fit_generator()**

In [0]:
model.fit_generator(
        train_generator, # (tuple) data (inputs, targets)
        steps_per_epoch = None, # (integer) value when the second data epoch ends with an epoch and executes the next epoch
        epochs = 10, # (integer) number of rounds of data iteration
        verbose = 1, # verbose = 0 (silent), verbose = 1 (animated progress bar of current status), verbose = 2 (the no of epoch)
        callbacks = None, # 
        validation_data = None, # validation data tuple
        validation_steps = None, # validation data steps valdation_data_samples// train_batch
        class_weight = None, # define a dictionary with class labels and their associated weights cl_weight = {0: 1, 1: 50}.
        max_queue_size = 0, # maximum capacity of the generator queue
        workers = 1,
        use_multiprocessing = False,
        shuffle = True, 
        initial_epoch = 0) # start training from this epoch (used to continue training)

# steps_per_epoch is used to indicate that one epoch is completed. The value is usually set by dividing the total no of training 
# samples by the batch size. The result is no of steps per epoch we use. 

In **fit_generator()**, we don't pass the training data directly, instead they come from a generator. The fit_generator() function accepts the batch of data, performs backpropagation, and updates the weights in our model.

## 5. Evaluate Model

Evaluate your model using **evaluate()** and **evaluate_generator()** functions to evaluate the performance of network on the test dataset.

These functions will return a list with two values. The first will be the loss of the model and the second will be the accuracy of the model on the test dataset.

In [0]:
# evaluate the keras model
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))

_, accuracy = model.evaluate_generator(test_generator, steps=None, callbacks=None, max_queue_size=10, workers=1, use_multiprocessing=False, verbose=0)
print('Accuracy: %.2f' % (accuracy*100))

## 6. Make predictions 

Make predictions on the test dataset. The functions **predict()**, **predict_classes()** and **predict_generator()** are used
to make predictions on test dataset. 

- **predict()**  will return 0.6 class1 and 0.4 class2 
- **predict_classes()** will return the actual class. For example, class1
- **predict()**  will return the result value in case of regression

In [0]:
# make probability predictions with the model
predictions = model.predict(X) # model.predict for regression problems
for i in range(len(X)):
    print("X=%s, Predicted=%s" % (X[i], predictions[i]))

# make the class predictions with the model
predictions = model.predict_classes(X) # model.predict for classification problems
for i in range(len(X)):
    print("X=%s, Predicted=%s" % (X[i], predictions[i]))

# How to save a Keras model?

Save model using **save()** function. This function make a single HDF5 file which contains:

+ the architecture of the model, allowing to re-create the model.
+ the weights of the model.
+ the training configuration (loss, optimizer).
+ the state of the optimizer, allowing to resume training exactly where you left off.

In [0]:
model.save("model.h5")

# How to load a Keras model?

In [0]:
from keras.models import load_model

model = load_model('model.h5')

# How to save weights after some epochs in Keras?

Deep learning models can take hours, days or even weeks to train. If the run is stopped unexpectedly, you can lose a lot of work. So, the ideal solution is to save weights after every or some epochs.

In keras ModelCheckpoint callback class helps us to save our model weights for each epoch. 

In [0]:
from keras.callbacks import ModelCheckpoint

checkpoint = ModelCheckpoint(filepath, # path to save the model file
                             monitor='val_acc', # 
                             verbose=1,  
                             save_best_only=False, # will save best weights only
                             save_weights_only=True, # if True (model.save_weights(filepath)) else save weights with model (model.save(filepath))
                             mode='auto', # {auto, min, max} 
                             period=10) # difference between two checkpoints
callbacks_list = [checkpoint]
# now use this checkpoint in step 4 (fit model)

# Split validation data automatically using Keras ImageDataGenerator

Use **validation_split** parameter that takes a floating-point number between 0 and 1, which is used to specify a certain proportion of data in the training set as the verification set. 

**validation_split** takes all input data and splits it between train and validation sets. 


In [0]:
model.fit(train_X, train_Y, batch_size=16, epochs=10, verbose=2, validation_split=0.2) 

**validation_data** requires valX, valY explicitly. 

In [0]:
model.fit(train_X, train_Y, validation_data=(testX,testY), epochs=10, batch_size=10)

How to split data in case of **ImageDataGenerator**?

In [0]:
train_datagen = ImageDataGenerator(rescale=1./255,
    validation_split=0.2) # set validation split

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='binary',
    subset='training') # set as training data

validation_generator = train_datagen.flow_from_directory(
    train_data_dir, # same directory as training data
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='binary',
    subset='validation') # set as validation data

model.fit_generator(
    train_generator,
    steps_per_epoch = train_generator.samples // batch_size,
    validation_data = validation_generator, 
    validation_steps = validation_generator.samples // batch_size,
    epochs = nb_epochs)