Supporting Statistical Analysis for Research
4.4 Dropping unneeded variables
These exercises use the PSID.csv data set
that was imported in the prior section.
Import the
PSID.csvdata set.from pathlib import Path import pandas as pdpsid_path = Path('..') / 'datasets' / 'PSID.csv' psid_in = pd.read_csv(psid_path) psid_in = ( psid_in .rename( columns={ 'Unnamed: 0': 'obs_num', 'intnum': 'intvw_num', 'persnum': 'person_id', 'married': 'marital_status'})) psid = psid_in.copy(deep=True) print(psid.dtypes)obs_num int64 intvw_num int64 person_id int64 age int64 educatn float64 earnings int64 hours int64 kids int64 marital_status object dtype: objectDrop the first variable in the data frame. You may have renamed it after it was loaded.
psid = psid.drop(columns='obs_num')Make the age variable the first variable in the data frame.
psid = psid.loc[:, [ 'age', 'intvw_num', 'person_id', 'educatn', 'earnings', 'hours', 'kids', 'marital_status']]