Supporting Statistical Analysis for Research
3.7 Summarizing data
There are a couple of ways that summary statics are used when wrangling data. One of them is to generate tables of the summary statistics. Other is to use them to calculate new variables.
Examples
Summary table.
This example uses
summarise()to create a table of summary statistics for theprofitsvariable. Thesummarise()function returns a tibble with a column for each summary statistic it calculates.forbes_summary <- forbes %>% summarise( `profits-mean` = mean(profits, na.rm = TRUE), `profits-sd` = sd(profits, na.rm = TRUE), `profits-1q` = quantile(profits, prob = .25, na.rm = TRUE), `profits-3q` = quantile(profits, prob = .75, na.rm = TRUE) ) forbes_summary# A tibble: 1 x 4 `profits-mean` `profits-sd` `profits-1q` `profits-3q` <dbl> <dbl> <dbl> <dbl> 1 0.381 1.77 0.08 0.44Calculating with summary statistics
This example calculate ths same mean and standard deviation of profits as the prior example. Rather than use
summarise()to create a table with these values, this example calculates a z-score for profits with the summary statistics.forbes <- forbes %>% mutate( profits_std = (profits - mean(profits, na.rm = TRUE)) / sd(profits, na.rm = TRUE) ) forbes %>% pull(profits_std) %>% summary()Min. 1st Qu. Median Mean 3rd Qu. Max. NA's -14.84668 -0.17057 -0.10260 0.00000 0.03334 11.65642 5outlier_bounds <- forbes_summary %>% mutate( iqr = `profits-3q` - `profits-1q`, lower_bounds = `profits-1q` - iqr, upper_bounds = `profits-3q` + iqr ) forbes <- forbes %>% mutate( outlier = profits < pull(outlier_bounds, lower_bounds) | profits > pull(outlier_bounds, upper_bounds) ) forbes %>% pull(outlier) %>% summary()Mode FALSE TRUE NA's logical 1578 417 5Proportion of observations
The
outlierindicator variable (created in the prior example) can be used to determine the proportion of the companies that have pofit values that are outlier to the distribution of profits.forbes %>% summarise( outlier_proportion = mean(outlier, na.rm = TRUE) )# A tibble: 1 x 1 outlier_proportion <dbl> 1 0.209