GeoPython 2020

Applied Machine Learning in Python using scikit-learn, mlxtend and pandas
2020-09-21, 09:15–10:45, Room 2

The eternal question which haunts every aspiring data scientist is - Where should I begin? Is traditional machine learning still relevant in this era to solve business problems? In this tutorial we will address these questions and take a deep dive into applying some of the most widely used traditional machine learning algorithms on real life use cases using scikit-learn, mlxtend and pandas.


“A baby learns to crawl, walk and then run. We are in the crawling stage when it comes to applying machine learning.”

With the advent of Deep Learning algorithms a decade back, the field of data science and machine learning has witnessed renewed zeal and enthusiasm. Today, every firm is eager to hire a data scientist who can derive value out of the data, but the key question is - Where should I begin? Various industry leaders are deploying deep learning models, should I do the same? Is traditional machine learning still relevant in this era to solve my business problem?

In this tutorial we will address these question and take a deep dive into applying some of the most widely used traditional machine learning algorithms on real life use cases. We will utilize open source libraries - scikit-learn, pandas & mlxtend for this purpose.

The key steps we will employ to tackle each problem are:
1. Understanding the algorithm
2. Importing the data
3. Data wrangling using pandas
4. Machine learning model development using scikit-learn/mlxtend
5. Model performance evaluation

Each exercise will employ a jupyter notebook based learning environment.

The workshop session (90 mins) will be divided as follows:
1. Introduction to Machine Learning - 5 mins
2. Why traditional machine learning is still relevant! - 5 mins
3. Exercise #1: Real Estate Valuation using Regression Algorithm (OLS) - 15 mins
4. Exercise #2: Market Basket Analysis using Association Rule Learning Algorithm (Apriori) - 15 mins
5. Exercise #3: Credit Risk Analysis using Instance-based Algorithm (kNN) - 15 mins
6. Exercise #4: Macroeconomic Analysis of Countries using Clustering Algorithm (k-Means) - 15 mins
7. Exercise #5: Credit Risk Analysis using Decision Tree Algorithm (CART) - 15 mins
8. Closing Remarks and Q&A - 5 mins

Video introduction for tutorial - Link
Tutorial Slides - Link

Prerequisites

  • Technical: Basic Python Programming
  • Software: Python 3.6+

Please install the following python packages -
pip install scikit-learn, pandas, mlxtend, matplotlib