Step by step CNN implementation in Keras for beginners for image classification problem

Aditi Mukerjee
10 min readFeb 16, 2021

A convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. The network can be a pretty large network and it can range from a few million to a billion parameters. Below I have an image of the architecture of the VGG16 model.

The architecture of VGG16 model (Credits: https://www.researchgate.net/figure/Architecture-of-VGG16_fig1_327060416)

Here I am going to implement a convolutional neural network (CNN) from scratch in Keras. This implement has been performed to identify whether a person has pneumonia or not.

You can download the dataset from the link below.

https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia

Once you have downloaded the images then you can proceed with the steps written below. Once you have loaded the dataset in google colab or jupyter notebook we can move on with loading the libraries that will be needed to implement the CNN. I will be using the Sequential method as I am creating a sequential model. A sequential model means that all the layers of the model will be arranged in sequence.

import keras,os
from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPool2D , Flatten
from keras.preprocessing.image import ImageDataGenerator
import numpy as np

EXPLORING THE DATASET

Here is the image of a person’s chest who does not have Pneumonia.

Here are a couple of images of a person’s chest who has Pneumonia.

OKAY! SO WERE YOU ABLE TO SPOT DIFFERENCES?

Well, I am no expert in reading XRAYs and I can't tell what differences are there, However, were you able to understand the difference between normal and pneumonia’s one?

If not, then how do we go about it?

There are two ways to understand whether these X- RAY images are of a pneumonia patient or not:

1. Go to a medical school, spend a lot of money, invest a lot of time and then come again after years to read it, this seems quite hard though

2. Watch all the images, look for patterns in both classes and understand those differences and then come again to visualize and understand it.

If trying to guess which image is what is such a time-consuming task then how does the computer do it?

Guessing (https://www.shutterstock.com/search/kids+guess)

Computers, learn/see in not such a different way from us. They will, however, need to look and analyze thousands upon thousands of images before they can generalize and say that a yellow umbrella falls under the same category as a black umbrella. This is because what they see are not pictures, but numerical representations of pixels describing these pictures. So while we see ‘things in pictures, a computer sees this:

Computer sees the image as this

A Convolutional Neural Network is a special type of Artificial Intelligence implementation that uses a special mathematical matrix manipulation called the convolution operation to process data from the images.

STEPS INVOLVED IN CNN

  • A convolution does this by multiplying two matrices and yielding a third, smaller matrix.
  • The Network takes an input image and uses a filter (or kernel) to create a feature map describing the image.
  • In the convolution operation, we take a filter (usually 2x2 or 3x3 or 5x5 matrix ) and slide it over the image matrix. The corresponding numbers in both matrices are multiplied and added to yield a single number describing that input space. This process is repeated all over the image.
  • We use different filters to pass over our inputs and take all the feature maps, put them together as the final output of the convolutional layer.
  • We then pass the output of this layer through a non-linear activation function. The most commonly used one is ReLU.
  • The next step of our process involves further reducing the dimensionality of the data which will lower the computation power required for training this model. This is achieved by using a Pooling Layer. The most commonly used one is max pooling which takes the maximum value in the window created by a filter. This significantly reduces the training time and preserves significant information.
Typical CNN model

SOME TERMS

KERNEL: In a Convolutional neural network, the kernel is nothing but a filter that is used to extract the features from the images. The kernel is a matrix that moves over the input data, performs the dot product with the sub-region of input data, and gets the output as the matrix of dot products.

MAX POOLING: Max pooling is a pooling operation that selects the maximum element from the region of the feature map covered by the filter. Thus, the output after max pooling layer would be a feature map containing the most prominent features of the previous feature map.

Max Pooling

STRIDE: Stride just means the amount a filter moves during a convolution operation. So, a stride of 1 means that the filter will slide 1 pixel after each convolution operation as shown in this animation.

DATA AUGMENTATION

Here I have imported ImageDataGenerator from Keras. preprocessing. The objective of ImageDataGenerator is to import data with labels easily into the model. It is a very useful class as it has many functions to rescale, rotate, zoom, flip, etc. The most useful thing about this class is that it doesn’t affect the data stored on the disk. This class alters the data on the go while passing it to the model.

As this one of the first data augmentation, I used the code mentioned in the Keras documentation and re-used it.

train_data_generator= ImageDataGenerator(rescale=1./255,zoom_range=0.2,vertical_flip=True) # same as in keras documentataiontrain_generator=train_data_generator.flow_from_directory(directory=train_dir,batch_size=100, #chose a size at random
target_size =(255,255), # assumed a target size
shuffle=True,class_mode = 'binary')
val_data_generator= ImageDataGenerator(rescale=1./255,zoom_range=0.2,vertical_flip=True)
val_generator=val_data_generator.flow_from_directory(directory=valid_dir,batch_size=100,target_size =(255,255),shuffle=True,class_mode = 'binary')

The ImageDataGenerator will automatically label all the data inside train and validation folder. In this way data is easily ready to be passed to the neural network.

DEFINING THE MODEL

As it was my first time building a CNN model by myself, I built a very basic model.

  1. Initially, there are one convolutional layer followed by a max-pooling layer to pick important features from the convolution matrix and then a dropout layer.
  2. Flatten Layer flattens n-dimension matrix into 1-D so that it could be passed into Dense Layers
  3. Two Dense Layers are fully connected: the first layers have 128 neurons and another has 1 neuron to give results which would be a binary neural network.
cnn_model_drop=tf.keras.Sequential([tf.keras.layers.Conv2D(32,(3,3),input_shape=(255,255,3),activation='relu'),tf.keras.layers.MaxPooling2D(2,2), # First Convolution and Pooling Layerstf.keras.layers.Conv2D(32,(3,3),activation='relu'),
tf.keras.layers.MaxPooling2D(2,2), # Second Convolution and Pooling Layers
tf.keras.layers.Dropout(0.2), # First dropout
tf.keras.layers.Conv2D(32,(3,3),activation='relu'),
tf.keras.layers.MaxPooling2D(2,2), # Third Convolution and Pooling Layers
tf.keras.layers.Dropout(0.2), # Second dropout
])

Here I have started with initializing the model by specifying that the model is a sequential model. After initializing the model I add

→1 x convolution layer of 32 channel of 3x3 kernal and activation of ‘relu’

→ 1 x maxpool layer of 2x2 pool size

→ 1 x convolution layer of 32 channel of 3x3 kernal and activation of ‘relu’

→ 1 x maxpool layer of 2x2 pool size

→ 1 x dropout layer of size 0.2

→ 1 x convolution layer of 32 channel of 3x3 kernal and activation of ‘relu’

→ 1 x maxpool layer of 2x2 pool size

→ 1 x dropout layer of size 0.2

I also add ReLU(Rectified Linear Unit) activation to each layers so that all the negative values are not passed to the next layer.

tf.keras.layers.Flatten(), # Flatten the Layers and Add Fully Connected Layers
tf.keras.layers.Dropout(0.2), # Third dropout
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')

After creating all the convolution I pass the data to the dense layer so for that I flatten the vector which comes out of the convolutions and add

→ 1 x Dropout layer of 0.2 units

→ 1 x Dense layer of 128 units

→ 1 x Dense Sigmoid layer of 1 unit

I will use ReLU activation for both the dense layer of 4096 units so that I stop forwarding negative values through the network. I use a 1 unit dense layer in the end with sigmoid activation as I have 2 classes to predict whether the person has pneumonia or not.

After the creation of the sigmoid layer is added, the model is finally prepared. Now I need to compile the model.

from keras.optimizers import Adamopt = Adam(lr=0.001) #start with a small learning ratemodel.compile(optimizer = opt , loss = 'binary_crossentropy' , metrics = ['accuracy'])

Here I will be using Adam optimizer to reach the global minima while training our model. If I am stuck in local minima while training then the adam optimizer will help us to get out of local minima and reach global minima. We will also specify the learning rate of the optimizer, here in this case it is set at 0.001. If our training is bouncing a lot on epochs then we need to decrease the learning rate so that we can reach global minima.

MODEL SUMMARY

You can check the summary of the model which I created by using the code below.

model.summary()

The output of this will be the summary of the model which I just created.

Here we have about 3.7 million parameters to train.

DEFINING THE CALLBACKS

After the creation of the model, I will import ModelCheckpoint and EarlyStopping method from Keras. I will create an object of both and pass that as callback functions to fit_generator.

ModelCheckpoint helps us to save the model by monitoring a specific parameter of the model. In this case, I am monitoring validation accuracy by passing val_acc to ModelCheckpoint. The model will only be saved to disk if the validation accuracy of the model in the current epoch is greater than what it was in the last epoch.

EarlyStopping helps us to stop the training of the model early if there is no increase in the parameter which I have set to monitor in EarlyStopping. In this case, I am monitoring validation accuracy by passing val_accuracy to EarlyStopping. I have here set patience to 5 which means that the model will stop to train if it doesn’t see any rise in validation accuracy in 5 epochs.

I have defined the third callback. Here we defined a function that will reduce the learning rate by half if for every 10 epoch. This will help us in reaching the global minimum faster in the beginning and trying to converge at it once we reach there.

model_name = "best_model.h5"#CALLBACK - 1 - early_stopearly_stop = tf.keras.callbacks.EarlyStopping(monitor = 'val_accuracy' , patience = 5)#CALLBACK - 2 - monitormonitor = tf.keras.callbacks.ModelCheckpoint (model_name, monitor = 'val_accuracy',save_best_only = True , save_weights_only = True )#CALLBACK - 3 - lr_scheduledef scheduler(epoch, lr):if epoch%10==0:  #every 10 epoch reduce the learning rate by a factor of 2lr = lr/2return lrlr_schedule = tf.keras.callbacks.LearningRateScheduler(scheduler)# definding the callback

TRAINING THE MODEL

I am using model.fit_generator as I am using ImageDataGenerator to pass data to the model. I will pass train and validation data to fit_generator. In fit_generator steps_per_epoch will set the batch size to pass training data to the model and validation_steps will do the same for test data. You can tweak it based on your system specifications.

After executing the above line the model will start to train and you will start to see the training/validation accuracy and loss.

Once you have trained the model you can visualize training/validation accuracy and loss. As you may have noticed I am passing the output of mode.fit_generator to hist variable. All the training/validation accuracy and loss are stored in hist and I will visualize it from there.

import matplotlib.pyplot as plt
plt.plot(hist.history["acc"])
plt.plot(hist.history['val_acc'])
plt.plot(hist.history['loss'])
plt.plot(hist.history['val_loss'])
plt.title("model accuracy")
plt.ylabel("Accuracy")
plt.xlabel("Epoch")
plt.legend(["Accuracy","Validation Accuracy","loss","Validation Loss"])
plt.show()

Here I will visualize training/validation accuracy and loss using matplotlib.

TESTING THE MODEL AND EXTRACTING THE METRICS

Now is the moment of truth:

Let's test the model and see its performance.

Here we have an accuracy of 63% and a loss of 0.69

This is a complete implementation of CNN in Keras using ImageDataGenerator. We can make this model work for any number of classes by changing the unit of the last softmax dense layer to whatever number we want based on the classes which we need to classify

Github repo link: https://github.com/aditimukerjee/Deep-Learning---Supervised-Learning---Pneumonia

If you have any questions or comments or need any further clarifications please don’t hesitate to contact me at aditimukerjee33@gmail.com or reach me at 403–671–7296. If you are interested in collaborating on any project, feel free to reach out to me without any hesitation.

If you enjoyed this story, please click the 👏 button and share to help others find it! Feel free to leave a comment below.

Have fun building CNN models !!

--

--

Aditi Mukerjee

Engineer. Data Analyst. Machine Learning enthusiast