2. California Housing

Process a dataset with house prices across California neighborhoods

You will need an access code to unlock this lab module.

Get Access Code

In this lab, you’re going to write an app in C# that use feature engineering to process the data in a machine learning dataset.

You’ll use the California Housing Dataset, which is famous for its use in many machine learning courses. But the dataset has several issues that you’ll need to discover and deal with before you can train a machine learning model on the data .

You will have to detect and deal with outliers, scale and normalize columns to a sane numeric range, bin- and one-hot encode categorical data columns, and cross latitude and longitude columns if present.

The California Housing dataset is perfect for practicing your feature engineering skills. It’s virtually unprocessed and requires lots of transformation steps before it is suitable for machine learning training.

The California Housing Dataset

Get The Data

Analyze The Data

Plot A Histogram Of Total Rooms

Plot Median House Value By Median Income

Plot The Pearson Correlation Matrix

Design And Build The Transformation Pipeline

Cross Latitude And Longitude

Recap

Conclusion

View All Modules Start Lab Module