Supporting Statistical Analysis for Research
2.8 Subsetting rows
The tidyverse provides several functions to select
rows from a tibble.
filter()selects rows using a Boolean condition.sample_n()andsample_frac()take a random sample of the rows.slice()selects row by numeric position.
2.8.1 Examples
Create test and training data frames using
filter().set.seed(145705) cps <- cps %>% mutate( split = ifelse(runif(n()) > .75, "test", "train") ) cps_train <- cps %>% filter(split == "train") cps_test <- cps %>% filter(split == "test") dim(cps)[1] 15992 11dim(cps_train)[1] 11902 11dim(cps_test)[1] 4090 11Create test and training data frames using
slice().set.seed(145705) test_indx <- which(runif(nrow(cps)) > .75) train_ind <- setdiff(1:nrow(cps), test_indx) cps_train <- cps %>% slice(train_ind) cps_test <- cps %>% slice(test_indx) dim(cps)[1] 15992 11dim(cps_train)[1] 11902 11dim(cps_test)[1] 4090 11