Supporting Statistical Analysis for Research
5.7 Relationships between columns
These exercises use the Chile.csv
data set.
Import the
Chile.csv
file.from pathlib import Path import pandas as pd import numpy as np
chile_path = Path('..') / 'datasets' / 'Chile.csv' chile_in = pd.read_csv(chile_path) chile_in = chile_in.rename(columns={'statusquo': 'status_quo'}) chile = ( chile_in .copy(deep=True) .drop('Unnamed: 0', axis='columns')) print(chile.dtypes)
region object population int64 sex object age float64 education object income float64 status_quo float64 vote object dtype: object
Find all rows with a missing value in any column using a related columns method.
chile_na_rows = ( chile .assign(missing=lambda df: df .isna() .any(axis='columns') >= 1) .query('missing') .drop('missing', axis='columns')) print(chile_na_rows.head())
region population sex age education income status_quo vote 12 N 175000 F 27.0 PS NaN 1.43448 Y 14 N 175000 M 36.0 PS 35000.0 1.49026 NaN 27 N 175000 F 43.0 P NaN 0.15489 A 75 N 125000 F 32.0 S NaN -0.85035 N 97 N 125000 F 34.0 P 2500.0 0.10807 NaN