Data processing tools
from sklearn.datasets import california_housing
dt = california_housing.fetch_california_housing()

df = pd.DataFrame(dt['data'], columns=dt['feature_names'])

Droping the columns with too much missing value

no_missing_values[source]

no_missing_values(dataset:DataFrame, missing_threshold:float=0.6)

Find the features with a fraction of missing values above missing_threshold

df2 = no_missing_values(df)
missing_fraction
MedInc 0.0
HouseAge 0.0
AveRooms 0.0
AveBedrms 0.0
Population 0.0
AveOccup 0.0
Latitude 0.0
Longitude 0.0
0 features with greater than 60.0% missing values.

Plot histogram

plot_hist[source]

plot_hist(df:DataFrame, feat2show:List[str]=None)

Plot histograms on columns feat2show default to plot all columns One historygram for each column

plot_hist(df)