2. California Housing

Process a dataset with house prices across California neighborhoods

You need to subscribe to all-access membership or buy the course to unlock this lab module.

Get Access

In this lab, you’re going to design an Azure machine learning pipeline that use feature engineering to process the data in a training dataset.

You’ll use the California Housing Dataset, which is famous for its use in many machine learning courses. But the dataset has several issues that you’ll need to discover and deal with before you can train a machine learning model on the data .

You will have to detect and deal with outliers, scale and normalize columns to a sane numeric range, bin- and one-hot encode categorical data columns, and cross latitude and longitude columns if present.

The California Housing dataset is perfect for practicing your feature engineering skills. It’s virtually unprocessed and requires lots of transformation steps before it is suitable for machine learning training.

The California Housing Dataset

Get The Data

Analyze The Data

Create The Azure Storage Account

Create The Azure Datastore

Create The Azure Dataset

Profile The Dataset

Build The Pipeline

Recap

Conclusion

View All Modules Start Lab Module