Member-only story
Python HOW: Scikit-learn Optimal Pipeline and Best Practices

At the end of this article you’ll be a master Sklearn plumber. You’ll know how to pipe in numerical and categorical attributes without having to use Pandas get_dummies or Sklearn FeatureUnion
1. Install/Update
At the time of writing this post, 0.21.2 was the latest release of sklearn. Check the docs for dependencies and either install or update
conda install scikit-learn==0.21.2
conda update scikit-learn==0.21.2
2. Toy dataset
We’ll use a sample dataset of audience churn with 1000 instances, and 19 attributes, 10 numerical and 9 categorical. You can download AudienceChurn.dataSample.csv
from here (click clone or download > Download ZIP > extract), and you can read its description here
Let’s read the csv file into a DataFrame and print its information: