Introduction

In this post, we’re going to employ one simple natural language processing (NLP) algorithm known as bag-of-words to classify messages as ham or spam. Using bag of words and feature engineering related to NLP, we’ll get hands-on experience on a small dataset of one SMS message, a lot of SMS messages, and email for SPAM/HAM classification.

SPAM/HAM email (photo credits: https://www.lucypark.kr/courses/2015-dm/svm.html)

Spam emails or messages belong to the broad category of unsolicited messages received by a user. Spam occupies unwanted space and bandwidth, amplifies the threat of viruses like trojans, and in general exploits a user’s connection to social networks.

Various techniques are employed to…


A convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. The network can be a pretty large network and it can range from a few million to a billion parameters. Below I have an image of the architecture of the VGG16 model.

The architecture of VGG16 model (Credits: https://www.researchgate.net/figure/Architecture-of-VGG16_fig1_327060416)

Here I am going to implement a convolutional neural network (CNN) from scratch in Keras. This implement has been performed to identify whether a person has pneumonia or not.

You can download the dataset from the link below.

https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia

Once you have downloaded the images then you…


In Canada particularly in Ontario, the COVID-19 Cases are on a rise. Ontario’s Premier Doug Ford declared a second state of emergency and issued a stay-at-home order for the province, which started on Jan 14, 2021.

Photo credits: http://gbfht.ca/covid-how-to-send-to-centre/

The latest headline states that Ontario reports a single-day increase in new COVID-19 cases, records more than 3,400 infections

Here I am visualzing Ontario COVID-19 data found here:

DATASET

The data is actually quite extensive but it is missing a lot of values. A couple of abbreviations that have been used in data are LTC ie Long Term Care and HCW, Health care workers…


In this blog, we will be covering up the concepts of using the logistic regression along with neural networks, applying forward and backward propagation, and then applying them to the practice in order to build your image recognition system i.e a cat classifier in this case. This cat classifier takes an image as an input and then it predicts whether the image contains a cat or not with 70% accuracy and the tools used will be Jupyter Notebook and the code is written in python.

Let’s see a bit of theory about Neural networks and Logistic regression.

Logistic Regression

Logistic regression is…


Introduction

Here I built an ML pipeline using Microsoft Azure Studio.

Dataset

I used the Adult Census dataset within Microsoft Azure ML Studio for predicting accuracies. This dataset is from the Machine Learning repository.

Data selection and cleaning

  • The Adult Census dataset was imported into the dashboard.
  • To account for the missing data, all missing values were substituted by 0 using the Clean Missing Data module.
  • Next using the Select Columns in the Dataset module, irrelevant and redundant columns were excluded from the data. This was done to reduce the clutter during analysis.
  • Once the final set of features is…

Introduction

In this blog, I discuss my machine learning capstone project where I apply algorithms (for both classification and regression) to identify the most reliable algorithm in each of them that depicts the highest performance on both training and test data and that can be considered for the future dataset.

The original data file as well as the jupyter notebooks files have been made public on my Github here.

Dataset

The data file “GHG_Emission.csv” has been retrieved from Alberta Energy Regulators (AER) website; where the locations of the wells have been changed, and some key properties are generated synthetically or…


Introduction

In this blog, I explored the dataset discussing tracking alcohol consumption in mainly European and Australian nations.

The original Tableau file and pdf for the “Alcohol Consumption by Country” data has been made public on my Github here.

For this blog, I have used the makeover Mondays dataset — Alcohol Consumption by Country.

The dataset contains ranks of countries and liters of pure alcohol consumed per capita.

Alcohol consumption (Link: https://www.verywellmind.com/the-link-between-stress-and-alcohol-67239)

This alcohol consumption data is for 2019. This data represents alcohol consumption per capita for people who are over 15 years of age per population.

Data Visualization

Here is the Tableau dashboard…


Introduction

In this blog, I explored the dataset by Next Gen Stats tracking data for running NFL plays.

The code for the “NFL plays” data has been made public on my Github here.

Data Set

For this blog, I have used the Kaggle data set — NFL Big Data Bowl.

The dataset contains 65k rows and 48 columns of NFL data. Each row in the file corresponds to a single player’s involvement in a single play. The dataset was intentionally joined (i.e. denormalized) to make the API simple. …


Introduction

In this blog, I explored the credit default data using Microsoft Power BI.

The Microsoft BI dashboard for the “Credit Card Default” data has been made public on my Github here.

Data

The dataset used for the data analysis in Power bi is from UCI Machine Learning Repository. The dataset digs into the customer credit card default payments back in 2005. However, the last column used in the dataset is a made-up column.

Power BI Dashboard

Here is the dashboard I created.

Conclusions

  • Females defaulted more than males. To be specific single females faulted more than married women.
  • Undergraduate…

Exploring Wine Reviews using Pandas

Introduction

There are a lot of wine enthusiasts all over the world

In this blog, I explored the wine review data using Pandas.

The code for the “Wine Review” data has been made public on my Github here.

I am a Professional Engineer(P.Eng) and a Project Management Professional (PMP)with a strong engineering background in Thermal Energy, Oil and Gas Processing, Water and Wastewater Treatment Industry, and over 8 years of experience working in engineering consultancies.

I am a self-motivated, lifetime learner critical thinker who is passionate about data with skills in programming, and statistical analysis, my…

Aditi Mukerjee

Engineer. Data Analyst. Machine Learning enthusiast

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store