Supporting Statistical Analysis for Research

## 3.1 Preparatory exercises

The skills in these exercises are used in the exercises at the end of the discourses of this chapter. Take a moment and complete these to confirm that you are prepared for this chapter. If these exercises are difficult, review the prior chapter.

1. Import the `MplsStops.csv` data set.

Hint: It may take a large number of rows to determine the correct type for some variables.

``library(tidyverse)``
``````mpls_stops_path <- file.path("..", "datasets", "MplsStops.csv")
mpls_stops <-
mpls_stops_path,
guess_max = 100000,
col_types = cols()
)``````
``Warning: Missing column names filled in: 'X1' [1]``
``glimpse(mpls_stops)``
``````Observations: 51,920
Variables: 15
\$ X1             <dbl> 6823, 6824, 6825, 6826, 6827, 6828, 6829, 6830,...
\$ idNum          <chr> "17-000003", "17-000007", "17-000073", "17-0000...
\$ date           <dttm> 2017-01-01 00:00:42, 2017-01-01 00:03:07, 2017...
\$ problem        <chr> "suspicious", "suspicious", "traffic", "suspici...
\$ MDC            <chr> "MDC", "MDC", "MDC", "MDC", "MDC", "MDC", "MDC"...
\$ citationIssued <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
\$ personSearch   <chr> "NO", "NO", "NO", "NO", "NO", "NO", "NO", "NO",...
\$ vehicleSearch  <chr> "NO", "NO", "NO", "NO", "NO", "NO", "NO", "NO",...
\$ preRace        <chr> "Unknown", "Unknown", "Unknown", "Unknown", "Un...
\$ race           <chr> "Unknown", "Unknown", "White", "East African", ...
\$ gender         <chr> "Unknown", "Male", "Female", "Male", "Female", ...
\$ lat            <dbl> 44.96662, 44.98045, 44.94835, 44.94836, 44.9790...
\$ long           <dbl> -93.24646, -93.27134, -93.27538, -93.28135, -93...
\$ policePrecinct <dbl> 1, 1, 5, 5, 1, 1, 1, 2, 2, 4, 5, 1, 2, 1, 1, 1,...
\$ neighborhood   <chr> "Cedar Riverside", "Downtown West", "Whittier",...``````
2. Are there any rows that need to be ignored in the `MplsStops` data set?

``head(mpls_stops)``
``````# A tibble: 6 x 15
X1 idNum date                problem MDC   citationIssued personSearch
<dbl> <chr> <dttm>              <chr>   <chr> <chr>          <chr>
1  6823 17-0~ 2017-01-01 00:00:42 suspic~ MDC   <NA>           NO
2  6824 17-0~ 2017-01-01 00:03:07 suspic~ MDC   <NA>           NO
3  6825 17-0~ 2017-01-01 00:23:15 traffic MDC   <NA>           NO
4  6826 17-0~ 2017-01-01 00:33:48 suspic~ MDC   <NA>           NO
5  6827 17-0~ 2017-01-01 00:37:58 traffic MDC   <NA>           NO
6  6828 17-0~ 2017-01-01 00:46:48 traffic MDC   <NA>           NO
# ... with 8 more variables: vehicleSearch <chr>, preRace <chr>,
#   race <chr>, gender <chr>, lat <dbl>, long <dbl>, policePrecinct <dbl>,
#   neighborhood <chr>``````
``tail(mpls_stops)``
``````# A tibble: 6 x 15
X1 idNum date                problem MDC   citationIssued personSearch
<dbl> <chr> <dttm>              <chr>   <chr> <chr>          <chr>
1 60833 17-4~ 2017-12-31 23:11:15 suspic~ MDC   NO             NO
2 60834 17-4~ 2017-12-31 23:15:50 traffic MDC   YES            NO
3 60835 17-4~ 2017-12-31 23:18:32 suspic~ MDC   NO             NO
4 60836 17-4~ 2017-12-31 23:31:57 traffic MDC   NO             NO
5 60837 17-4~ 2017-12-31 23:48:22 traffic MDC   NO             YES
6 60838 17-4~ 2017-12-31 23:52:35 traffic MDC   NO             NO
# ... with 8 more variables: vehicleSearch <chr>, preRace <chr>,
#   race <chr>, gender <chr>, lat <dbl>, long <dbl>, policePrecinct <dbl>,
#   neighborhood <chr>``````

The rows at the start and end of the data frame look like observations. There is no indication of non data rows.

3. Are there any special symbols that need to be set to missing in the `MplsStops` data set? If so, change the special symbols to the missing indicator.

``````mpls_stops <-
mpls_stops_path,
na = c("", "NA", "Unknown"),
guess_max = 100000,
col_types = cols()
)``````
``Warning: Missing column names filled in: 'X1' [1]``
``glimpse(mpls_stops)``
``````Observations: 51,920
Variables: 15
\$ X1             <dbl> 6823, 6824, 6825, 6826, 6827, 6828, 6829, 6830,...
\$ idNum          <chr> "17-000003", "17-000007", "17-000073", "17-0000...
\$ date           <dttm> 2017-01-01 00:00:42, 2017-01-01 00:03:07, 2017...
\$ problem        <chr> "suspicious", "suspicious", "traffic", "suspici...
\$ MDC            <chr> "MDC", "MDC", "MDC", "MDC", "MDC", "MDC", "MDC"...
\$ citationIssued <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
\$ personSearch   <chr> "NO", "NO", "NO", "NO", "NO", "NO", "NO", "NO",...
\$ vehicleSearch  <chr> "NO", "NO", "NO", "NO", "NO", "NO", "NO", "NO",...
\$ preRace        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "White"...
\$ race           <chr> NA, NA, "White", "East African", "White", "East...
\$ gender         <chr> NA, "Male", "Female", "Male", "Female", "Male",...
\$ lat            <dbl> 44.96662, 44.98045, 44.94835, 44.94836, 44.9790...
\$ long           <dbl> -93.24646, -93.27134, -93.27538, -93.28135, -93...
\$ policePrecinct <dbl> 1, 1, 5, 5, 1, 1, 1, 2, 2, 4, 5, 1, 2, 1, 1, 1,...
\$ neighborhood   <chr> "Cedar Riverside", "Downtown West", "Whittier",...``````
4. Sort the data frame by `policePrecinct`. Hint, This requires searching outside of the material that has been covered.

``````mpls_stops <- arrange(mpls_stops, policePrecinct)

glimpse(mpls_stops)``````
``````Observations: 51,920
Variables: 15
\$ X1             <dbl> 6823, 6824, 6827, 6828, 6829, 6834, 6836, 6837,...
\$ idNum          <chr> "17-000003", "17-000007", "17-000098", "17-0001...
\$ date           <dttm> 2017-01-01 00:00:42, 2017-01-01 00:03:07, 2017...
\$ problem        <chr> "suspicious", "suspicious", "traffic", "traffic...
\$ MDC            <chr> "MDC", "MDC", "MDC", "MDC", "MDC", "other", "MD...
\$ citationIssued <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
\$ personSearch   <chr> "NO", "NO", "NO", "NO", "NO", NA, "NO", "NO", "...
\$ vehicleSearch  <chr> "NO", "NO", "NO", "NO", "NO", NA, "NO", "NO", "...
\$ preRace        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Bl...
\$ race           <chr> NA, NA, "White", "East African", "Black", NA, "...
\$ gender         <chr> NA, "Male", "Female", "Male", "Male", NA, "Male...
\$ lat            <dbl> 44.96662, 44.98045, 44.97908, 44.98054, 44.9808...
\$ long           <dbl> -93.24646, -93.27134, -93.26208, -93.26363, -93...
\$ policePrecinct <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
\$ neighborhood   <chr> "Cedar Riverside", "Downtown West", "Downtown W...``````