Using a Logistic regression along with Neural Networks for Cat vs Non-Cat Image Classification
In this blog, we will be covering up the concepts of using the logistic regression along with neural networks, applying forward and backward propagation, and then applying them to the practice in order to build your image recognition system i.e a cat classifier in this case. This cat classifier takes an image as an input and then it predicts whether the image contains a cat or not with 70% accuracy and the tools used will be Jupyter Notebook and the code is written in python.
Let’s see a bit of theory about Neural networks and Logistic regression.
Logistic Regression
Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis.
How do computers interpret Images
For a computer, an image is just an array of values. Typically it’s a 3-dimensional (RGB) matrix of pixel values.
For example, a 6x 6 RGB abstract image representation would look like this.
Where each pixel has a specific value of red, green, and blue that represents the color of a given pixel. Neural networks process images using matrixes of weights called filters (features) that detect specific attributes such as vertical edges, horizontal edges, etc. Moreover, as the image progresses through each layer, the filters are able to recognize more complex attributes.
General Architecture
Mathematical Expressions to be used:
Here L represents the Loss function
Computing cost function using the formula:
The main steps for building a Neural Network are:
- Define the model structure (such as number of input features)
- Initialize the model’s parameters
- Loop:
- Calculate current loss (forward propagation)
- Calculate current gradient (backward propagation)
- Update parameters (gradient descent)
Step 1: Creating a new Notebook
Creating a library:
import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
%matplotlib inline
Step 2: Loading the dataset
The dataset is saved here:https://github.com/aditimukerjee/Cat-and-Non-cat-for-logistic-regression
My Github has the entire code written below: https://github.com/aditimukerjee/Cat-and-Non-cat-for-logistic-regression
train_set_x_orig
is a NumPy-array of shape (m_train, num_px, num_px, 3)
Each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB) and num_px is the height equal to the width of a training image. Thus, each image is square (height = num_px) and (width = num_px).
Step 3: Analyzing the dataset
m_train = train_set_x_orig.shape[0]
m_test = test_set_x_orig.shape[0]
num_px = train_set_x_orig.shape[1]
print (“Number of training examples: m_train = “ + str(m_train))
print (“Number of testing examples: m_test = “ + str(m_test))
print (“Height/Width of each image: num_px = “ + str(num_px))
print (“Each image is of size: (“ + str(num_px) + “, “ + str(num_px) + “, 3)”)
print (“train_set_x shape: “ + str(train_set_x_orig.shape))
print (“train_set_y shape: “ + str(train_set_y.shape))
print (“test_set_x shape: “ + str(test_set_x_orig.shape))
print (“test_set_y shape: “ + str(test_set_y.shape))
Number of training examples: m_train = 209
Number of testing examples: m_test = 50
Height/Width of each image: num_px = 64
Each image is of size: (64, 64, 3)
train_set_x shape: (209, 64, 64, 3)
train_set_y shape: (1, 209)
test_set_x shape: (50, 64, 64, 3)
test_set_y shape: (1, 50)
Step 3: Reshaping the dataset
We need to now reshape images of shape (num_px, num_px, 3) in a numpy-array of shape (num_px ∗ num_px ∗ 3, 1). After this, our training (and test) dataset is a numpy-array where each column represents a flattened image. There should be m_train (respectively m_test) columns.
Flattening array means converting a multidimensional array into a 1D array.
A trick when you want to flatten a matrix X of shape (a,b,c,d) to a matrix X_flatten of shape (b ∗ c ∗ d, a) is to use:
X_flatten = X.reshape(X.shape[0], -1).T
Removing transpose it will be
X_flatten = X.reshape(-1, X.shape[0]) where -1 is the unspecified element of new the shape of x
Since x is originally of shape (m_train, num_px, num_px, 3) since already used x.shape[0] aka m_train , the remaining elements in the new shape should be num_px * num_px * 3 to make sure x can be reshaped so instead of specifying num_px * num_px * 3, we specify it as -1
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1)
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T
print (“train_set_x_flatten shape: “ + str(train_set_x_flatten.shape))
print (“train_set_y shape: “ + str(train_set_y.shape))
print (“test_set_x_flatten shape: “ + str(test_set_x_flatten.shape))
print (“test_set_y shape: “ + str(test_set_y.shape))
print (“sanity check after reshaping: “ + str(train_set_x_flatten[0:5,0]))
train_set_x_flatten shape: (12288, 209)
train_set_y shape: (1, 209)
test_set_x_flatten shape: (12288, 50)
test_set_y shape: (1, 50)
sanity check after reshaping: [17 31 56 22 33]
To represent color images, the red, green, and blue channels (RGB) must be specified for each pixel, and so the pixel value is actually a vector of three numbers ranging from 0 to 255.
train_set_x = train_set_x_flatten/255
test_set_x = test_set_x_flatten/255
Step 4: Sigmoid function
Writing down the sigmoid function , a prerequisite for logistic regression
def sigmoid(z):
s = 1/(1+np.exp(-z))
return s
print (“sigmoid([0, 2]) = “ + str(sigmoid(np.array([0,2]))))
Output : sigmoid([0, 2])[ 0.5 0.88079708]
Step 4: Initiating parameters
Initializing w as a vector of zeros. Initialize_with_zeros function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
def initialize_with_zeros(dim):
w = np.zeros((dim, 1))
b = 0
assert(w.shape == (dim, 1))
assert(isinstance(b, float) or isinstance(b, int))
return w, b
dim = 2
w, b = initialize_with_zeros(dim)
print (“w = “ + str(w))
print (“b = “ + str(b))
Output: w= [[ 0.] [ 0.]] b=0
Step 5: Forward and Backward Propogation
Implementing a function propagate()
that computes the cost function and its gradient.
def propagate(w, b, X, Y):
m = X.shape[1]
# FORWARD PROPAGATION (FROM X TO COST)
A = sigmoid(np.dot(w.T, X) + b) # compute activation
cost = -1./m* np.sum(Y*np.log(A) + (1-Y)*np.log(1-A)) # compute cost
# BACKWARD PROPAGATION (TO FIND GRAD)
dw = 1./m*np.dot(X, (A-Y).T)
db = 1./m*np.sum(A-Y)
assert(dw.shape == w.shape)
assert(db.dtype == float)
#cost = np.squeeze(cost)
#assert(cost.shape == ())
grads = {“dw”: dw,
“db”: db}
return grads, cost
w, b, X, Y = np.array([[1.],[2.]]), 2., np.array([[1.,2.,-1.],[3.,4.,-3.2]]), np.array([[1,0,1]])
grads, cost = propagate(w, b, X, Y)
print (“dw = “ + str(grads[“dw”]))
print (“db = “ + str(grads[“db”]))
print (“cost = “ + str(cost))
Output : dw=[[ 0.99845601] [ 2.39507239]] ,db=0.00145557813678 ,cost=5.801545319394553
Step 6: Optimization
The goal is to learn w and b by minimizing the cost function J. For a parameter θ, the update rule is θ=θ−α dθ, where α is the learning rate.
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
costs = []
for i in range(num_iterations):
# Cost and gradient calculation (≈ 1–4 lines of code)
grads, cost =propagate(w, b, X, Y)
# Retrieve derivatives from grads
dw = grads[“dw”]
db = grads[“db”]
# update rule (≈ 2 lines of code)
w = w — learning_rate * dw
b = b — learning_rate * db
# Record the costs
if i % 100 == 0:
costs.append(cost)
# Print the cost every 100 training examples
if print_cost and i % 100 == 0:
print (“Cost after iteration %i: %f” %(i, cost))
params = {“w”: w,
“b”: b}
grads = {“dw”: dw,
“db”: db}
return params, grads, costs
params, grads, costs = optimize(w, b, X, Y, num_iterations= 100, learning_rate = 0.009, print_cost = False)
print (“w = “ + str(params[“w”]))
print (“b = “ + str(params[“b”]))
print (“dw = “ + str(grads[“dw”]))
print (“db = “ + str(grads[“db”]))
w = [[0.19033591]
[0.12259159]]
b = 1.9253598300845747
dw = [[0.67752042]
[1.41625495]]
db = 0.21919450454067652
Step 7 : Prediction
There are two steps to computing predictions:
- Calculate Ŷ =A=σ(wTX+b)Y^=A=σ(wTX+b)
- Convert the entries of a into 0 (if activation <= 0.5) or 1 (if activation > 0.5), stores the predictions in a vector
Y_prediction
. If you wish, you can use anif
/else
statement in afor
loop (though there is also a way to vectorize this).
# GRADED FUNCTION: predict
def predict(w, b, X):
m = X.shape[1]
Y_prediction = np.zeros((1,m))
w = w.reshape(X.shape[0], 1)
# Compute vector “A” predicting the probabilities of a cat being present in the picture
A = sigmoid(np.dot(w.T, X) + b)
for i in range(A.shape[1]):
# Convert probabilities A[0,i] to actual predictions p[0,i]
if A[0, i] > 0.5:
Y_prediction[0, i] = 1
else:
Y_prediction[0, i] = 0
assert(Y_prediction.shape == (1, m))
return Y_prediction
w = np.array([[0.1124579],[0.23106775]])
b = -0.3
X = np.array([[1.,-1.1,-3.2],[1.2,2.,0.1]])
print (“predictions = “ + str(predict(w, b, X)))
predictions = [[1. 1. 0.]]
Step 8 : Merge all functions into a mode
Merging all the functions developed so far together.
def model(train_x, train_y_orig, test_x, test_y_orig, num_iterations = 3000, learning_rate = 0.6, print_cost = False):
w=np.zeros((train_x.shape[0],1))
b=0
parameters, grads, costs = optimize(w, b, train_x, train_y_orig, num_iterations, learning_rate, print_cost = False)
# Retrieve parameters w and b from dictionary “parameters”
w = parameters[“w”]
b = parameters[“b”]
# Predict test/train set examples
Y_prediction_test = predict(w, b, test_x)
Y_prediction_train = predict(w, b, train_x)
# Print train/test Errors
print(“train accuracy: {} %”.format(100 — np.mean(np.abs(Y_prediction_train — train_y_orig)) * 100))
print(“test accuracy: {} %”.format(100 — np.mean(np.abs(Y_prediction_test — test_y_orig)) * 100))
d = {“costs”: costs,
“Y_prediction_test”: Y_prediction_test,
“Y_prediction_train” : Y_prediction_train,
“w” : w,
“b” : b,
“learning_rate” : learning_rate,
“num_iterations”: num_iterations}
return d
d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 5000, learning_rate = 0.8, print_cost = True)
train accuracy: 99.99876382512535 %
test accuracy: 72.01052229722973 %
Step 12: Testing your model with your images
Created an image directory in the notebook and upload my images there, then assign the value of variable image in the following code as the image name.
This is a cat image and the image prediction is correct.
This is a non — cat image and the image prediction is incorrect.
This is a non-cat image and the image prediction is correct.
This is a non-cat image and the image prediction is correct.
This is a non-cat image and the image prediction is correct.
This is a cat image and the image prediction is correct.
Conclusion
The accuracy of the training set is 99.99% and the accuracy of the testing set is 72%. Out of the six images used for prediction, one image was predicted incorrectly since the accuracy is still low and it can be further improved.
The dataset used in this model is available on my Github along with my code that is available for public use. If you have any questions or comments or need any further clarifications please don’t hesitate to contact me at aditimukerjee33@gmail.com or reach me at 403–671–7296. If you are interested in collaborating on any project, feel free to reach out to me without any hesitation.