In this post, we’re going to employ one simple natural language processing (NLP) algorithm known as bag-of-words to classify messages as ham or spam. Using bag of words and feature engineering related to NLP, we’ll get hands-on experience on a small dataset of one SMS message, a lot of SMS messages, and email for SPAM/HAM classification.
Spam emails or messages belong to the broad category of unsolicited messages received by a user. Spam occupies unwanted space and bandwidth, amplifies the threat of viruses like trojans, and in general exploits a user’s connection to social networks.
Various techniques are employed to…
A convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. The network can be a pretty large network and it can range from a few million to a billion parameters. Below I have an image of the architecture of the VGG16 model.
Here I am going to implement a convolutional neural network (CNN) from scratch in Keras. This implement has been performed to identify whether a person has pneumonia or not.
You can download the dataset from the link below.
Once you have downloaded the images then you…
In Canada particularly in Ontario, the COVID-19 Cases are on a rise. Ontario’s Premier Doug Ford declared a second state of emergency and issued a stay-at-home order for the province, which started on Jan 14, 2021.
The latest headline states that Ontario reports a single-day increase in new COVID-19 cases, records more than 3,400 infections
Here I am visualzing Ontario COVID-19 data found here:
The data is actually quite extensive but it is missing a lot of values. A couple of abbreviations that have been used in data are LTC ie Long Term Care and HCW, Health care workers…
In this blog, we will be covering up the concepts of using the logistic regression along with neural networks, applying forward and backward propagation, and then applying them to the practice in order to build your image recognition system i.e a cat classifier in this case. This cat classifier takes an image as an input and then it predicts whether the image contains a cat or not with 70% accuracy and the tools used will be Jupyter Notebook and the code is written in python.
Let’s see a bit of theory about Neural networks and Logistic regression.
Here I built an ML pipeline using Microsoft Azure Studio.
I used the Adult Census dataset within Microsoft Azure ML Studio for predicting accuracies. This dataset is from the Machine Learning repository.
Data selection and cleaning
In this blog, I discuss my machine learning capstone project where I apply algorithms (for both classification and regression) to identify the most reliable algorithm in each of them that depicts the highest performance on both training and test data and that can be considered for the future dataset.
The original data file as well as the jupyter notebooks files have been made public on my Github here.
The data file “GHG_Emission.csv” has been retrieved from Alberta Energy Regulators (AER) website; where the locations of the wells have been changed, and some key properties are generated synthetically or…
In this blog, I explored the dataset discussing tracking alcohol consumption in mainly European and Australian nations.
The original Tableau file and pdf for the “Alcohol Consumption by Country” data has been made public on my Github here.
For this blog, I have used the makeover Mondays dataset — Alcohol Consumption by Country.
The dataset contains ranks of countries and liters of pure alcohol consumed per capita.
This alcohol consumption data is for 2019. This data represents alcohol consumption per capita for people who are over 15 years of age per population.
Here is the Tableau dashboard…
In this blog, I explored the dataset by Next Gen Stats tracking data for running NFL plays.
The code for the “NFL plays” data has been made public on my Github here.
For this blog, I have used the Kaggle data set — NFL Big Data Bowl.
The dataset contains 65k rows and 48 columns of NFL data. Each row in the file corresponds to a single player’s involvement in a single play. The dataset was intentionally joined (i.e. denormalized) to make the API simple. …
In this blog, I explored the credit default data using Microsoft Power BI.
The Microsoft BI dashboard for the “Credit Card Default” data has been made public on my Github here.
The dataset used for the data analysis in Power bi is from UCI Machine Learning Repository. The dataset digs into the customer credit card default payments back in 2005. However, the last column used in the dataset is a made-up column.
Power BI Dashboard
Here is the dashboard I created.
Exploring Wine Reviews using Pandas
There are a lot of wine enthusiasts all over the world
In this blog, I explored the wine review data using Pandas.
The code for the “Wine Review” data has been made public on my Github here.
I am a Professional Engineer(P.Eng) and a Project Management Professional (PMP)with a strong engineering background in Thermal Energy, Oil and Gas Processing, Water and Wastewater Treatment Industry, and over 8 years of experience working in engineering consultancies.
I am a self-motivated, lifetime learner critical thinker who is passionate about data with skills in programming, and statistical analysis, my…
Engineer. Data Analyst. Machine Learning enthusiast