Process a dataset with house prices across California neighborhoods
You will need an access code to unlock this lab module.
Get Access CodeIn this lab, you’re going to write an app in C# that use feature engineering to process the data in a machine learning dataset.
You’ll use the California Housing Dataset, which is famous for its use in many machine learning courses. But the dataset has several issues that you’ll need to discover and deal with before you can train a machine learning model on the data .
You will have to detect and deal with outliers, scale and normalize columns to a sane numeric range, bin- and one-hot encode categorical data columns, and cross latitude and longitude columns if present.
The California Housing dataset is perfect for practicing your feature engineering skills. It’s virtually unprocessed and requires lots of transformation steps before it is suitable for machine learning training.