Jane Allyn Piliavin- Sociology at UW Madison, bascom graphic

Tables

Turn the numbers from the computer printout into tables

Except for the correlation matrix that is part of the reliability analysis, you will be required to make new tables, rather than simply xeroxing the output you get from the computer. This is to indicate to me that you KNOW what numbers you are looking at to answer the various questions in the report. It would be useful to bring "blank" tables for your variables set up like those in the example to class on the date indicated, so that you can copy your data into them. Tables should be numbered sequentially so you can refer to them in your report, should have titles, and should include labels for the variables so that a person can understand what the numbers in the table are without having to read the rest of your paper. BE SURE TO DOUBLE- AND TRIPLE-CHECK YOUR COPYING OF NUMBERS FROM THE PRINTOUT INTO THE TABLES after the class.

Inspect Your Data

  1. Frequency distributions. There are several useful things to inspect the frequencies for.
    1. Problems of low variability. If 80% or more of the respondents chose one answer for an item, the variability of that variable is so low that statistical results with it are extremely problematic given small samples sizes. Results from such variables cannot be trusted, and they are often simply discarded.
    2. Lesser problems of low variability, where 60% or more of the cases choose one response, or where 80% or more of the cases are in two adjacent responses (of a variable with more than three responses). These may just indicate a skew in the population, but might also indicate a biased sample or a biased question.
    3. Comparisons across the different questions measuring the dependent variable, to see which items elicit the most favorable responses, and which the least favorable.
    4. Information about the distribution of attitudes in the population. Because your samples are non-random, you have to interpret these results very cautiously, but it is still interesting to find out what people said.
  2. Reliability analysis and correlations among the questions measuring the dependent variable. Ideally, the reliability analysis should show all of the corrected item-total correlations (the correlation of each item with the sum of all the other items) to be moderately high and positive (better than .4 or .5). and the coefficient alphas to be over .7. Also, for each item, the output should state that removing the item would lead to a lower alpha. If all of these things are true, your items are all good.
    If you don't have the ideal, there are several possible patterns to check for in the correlation matrix itself:
    1. One or two questions show low negative correlations and small positive ones with the other items; these questions have a low item-total correlation and an indication that alpha will go UP if they are removed. This probably means those questions are "bad" and the others are OK. You discard the bad questions and get a revised index using only the good ones.
    2. There are subsets of questions that have moderate or strong positive correlations with each other but negative or weak correlations with the others. This means the questions seem to be tapping more than one conceptual variable. You try to see which set most captures what you had in mind and use those for your index. Occasionally, you decide that both sets are interesting, and create two indices, one for each.
    3. There are a lot of strong negative correlations for one or two variables, and their item-total correlations are strongly negative. This usually means that you have forgotten to "reverse score" a question, or that subjects read the opposite meaning into a question than you intended. SEE ME.
    4. Most correlations in the table are close to zero, and negative ones are scattered across different questions, alphas and item-total correlations are low. This means that none of the items are properly related to any of the others, that there really is no single concept that your questions measure. This is the worst situation to be in, but it is rare. Most often, this turns out to be due to mistakes in coding the data, usually when partners code their data separately and turn out to be doing it differently. If you have data like this and you have checked for and ruled out coding errors, you definitely need to see me.
  3. Look to see if the means of the INDEX get larger in the expected direction across categories of the OPEN-ENDED question, and that the analysis of variance test is significant at p < .01 or smaller. This should occur if the INDEX and the OPEN-ENDED question are both "valid" measures of your concept. If this does not happen, you try to figure out which measure is "bad." Keep in mind that the open-ended question might have a problem, or that the open-ended categories may be out of order.
  4. Tests of the "obvious hypothesis using the index will employ either correlation analysis (if your independent variable is continuous) or analyses of variance (if I.V. is categorical). Tests of the hypothesis using the open-ended question will use chi-squares. Look to see if the hypothesis is confirmed or not. You may get
    1. significant findings using both measures,
    2. nothing using either measure, or
    3. significant findings with one measure and not with the other.

Again, there are implications for the validity of each of these measures from these results. We will discuss how you interpret these in class.

The following are a set of tables, based on the student questionnaire presented a few pages back. In class you will be given the actual printouts from these data, just like the printouts you will later receive for your own data. After the tables, there is a guide to the output, titled "Numbers are our friends."

EXAMPLE TABLES for Univariate statistics. The computer printout provides a lot of statistics. The first few pages go into Tables 1 and 2.

Table 1. Frequency Distributions for Independent Variables
(I have omitted items 16-19 for brevity of presentation)

WEEK11
(times)
f MONTH12
(times)
f YEAR13
(times)
f GENDER14 f
0 10 0 3 0 0 Male (1) 18
1 8 1 3 1 0 Female (2) 23
2 9 2 9 2 0    
3 8 3 7 3 4    
4 1 4 6 4 2    
5 2 5 5 5 7    
6 1 6 4 6 9    
7 2 7 4 7 10    
More than 7 0 8 0 8 9    

 

 

Note: The computer prints only values of the variable that have been chosen by respondents in the frequency tables. You should put in all values because it helps you to compare the questions in terms of where people feel most favorable or unfavorable. Note that the tables do NOT report a mean for the open-ended question, or the independent variables, even though the computer printed them, because in most cases it would be meaningless.

Table 2. Frequency Distributions And Means for Dependent Variables.
(I present only the first five variables and the OPEN code for brevity)

Life2 f Feel3 f Health4 f Satis5 f Dis6 f OPEN f
1 0 1 1 1 0 1 2 1 20 1 1
2 9 2 6 2 1 2 10 2 7 2 4
3 7 3 3 3 4 3 3 3 4 3 2
4 15 4 17 4 20 4 10 4 6 4 14
5 10 5 14 5 16 5 11 5 4 5 19
  Miss. 1
mean 3.65   3.90   4.24   3.56   2.20   (inapp.)

 


Reliability analysis; correlations among the closed-ended questions. The computer next printed out a triangular correlation matrix, which you copy into Table 3. (You may paste the matrix, as I have here, but you must provide a title.)

Table 3: Correlations among closed-ended questions and with the total scale

 

  LIFE2 FEELL3 HEALTH4 STATIS5 DIS6
LIFE2 1.0000        
FEEL3 .5260 1.0000      
HEALTH4 .2080 .2439 1.0000    
STATIS5 .1887 .2526 -.0164 1.0000  
DIS6 .3184 .4657 .4044 .1721 1.0000
ENC7 .4163 .4882 .3213 .2002 .6037
YOUR8 -.0309 .2452 .1565 .3973 .3010
PLAN9 .0754 .2477 -.0613 -.0841 .1463
IDEAL10 .5178 .4164 .2167 .0432 .3345

 

  ENC7 YOUR8 PLAN9 IDEAL10
ENC7 1.0000      
YOUR8 .3411 1.0000    
PLAN9 .3015 .1026 1.0000  
IDEAL10 .5334 .2686 .1298 1.0000

 

N of Cases= 41.0

Item-total Statistics

  Scale Mean If Item Deleted Scale Variance if Item Deleted Corrected Item-Total Correlation Squared Multiple Correlation Alpha if Item Deleted
LIFE2 27.8988 26.6976 .5145 .4846 .7324
FEEL3 27.6305 25.2831 .6390 .44550 .7118
HEALTH4 27.2890 30.5549 .3185 .2269 .7600
STATIS5 27.9720 29.1498 .2142 .2996 .6700
DIS6 29.3378 23.4761 .5852 .4521 .7832
ENC7 28.2159 23.4428 .7137 .5565 .7186
YOUR8 28.8049 31.2610 .3834 .3697 .6942
PLAN9 27.6793 31.2276 .1808 .1888 .7582
IDEAL10 27.4354 25.4388 .5228 .4557 .7751

 

Reliability Analysis - Scale (Alpha)

Reliability Coefficients 9 Items

Alpha=.7651, Standardized Item Alpha=.7606


We will discuss in class how you would interpret these numbers and what are their implications for validity.

Bivariate association between open-ended question and index. The computer gives you the mean for the index separately for each category of the open-ended question.

Table 4. Mean of the Index for Each Category of the Open-ended Question.

Grouped data

Mean

Standard
Deviation

f p-value
Codes 1-3
(N = 7)
26.89 4.65    
Code 4
(N = 14)
29.29 3.89 7.95 .001
Code 5
(N = 19)
34.51 5.65    

 


Tests of "obvious" hypothesis and other hypotheses.

For this data set, the test of the obvious hypothesis is done with correlations. The three behavioral items, Week11, Month12, and Year 13 are correlated .643**, .710**, and .368* respectively. (** = p<.01; * = p<.05)This information would be placed in the text. I present only one of the other variables in table 5.

Table 5 Mean of Index Separately for males and females.

  Mean Index Standard Deviation f p-value
Males
(N = 18)
31.66 4.11 .015 .904
Females
(N = 23)
31.43 6.94    


Table 6. Percentages of subjects responding in each category of the open-ended coding categories (collapsed) as a function of Week11 and Year13. (Other items omitted to save space)

 

  Week11 (recoded) Year13(recoded)
OPEN (grouped) 1 2 3 1 2 3
Codes 1-3 6(33%) 1(6%)   5(18%) 2(20%) 1(12%)
Code 4 8(44%) 6(38%)   12(54%) 2(20%)  
Code 5 4(22%) 9(56%) 6(100%) 6 (28%) 6(60%) 7(88%)
Chi-square, p-value:
13.53, p<.009
10.92, p<.027


*** END OF EXAMPLE***


NUMBERS ARE OUR FRIENDS: A BRIEF STATISTICS GUIDE

Some statistical concepts:

  • Pearson product-moment correlations, analysis of variance, and chi-square are three of many ways to calculate the likelihood that patterns of relationships between variables are unlikely to be due to chance.
  • Correlations can run from +1.00 through 0 to -1.00. Correlations indicate the extent to which one variable is associated with a second. Alternatively, one can see it as the extent to which you can predict one variable from the other. The larger the number, the better the prediction. The significance of a correlation -- the likelihood that it represents a "real" relationship between the two variables -- can be calculated. It is based in part on the size of the correlation and in part upon the sample size.
  • Analysis of variance (and t-test, a special case involving only two sample groups) tests whether it is reasonable to assume that the sample means on a continuous variable for several different groups come from different populations. That is, can we conclude with some confidence that there is something about the groups that is "really" different, as opposed to having to accept that the apparent mean differences are due to chance.
  • Chi-square tests whether the distribution of cases in a contingency table (a cross-tabulations of two nominal or ordinal variables) deviates from what we would expect the distribution of cases in the table would be based on chance alone.
  • These tests of "significance" are based on ratios that are calculated and then looked up in tables. (Nowadays, the computer does all of this for you.) You need not understand the calculations to understand the concept. What the level of significance tells you is what proportion of times you would expect to find as strong a relationship as is indicated by the data simply by the operation of chance factors. The larger the obtained ratio, and the smaller the p-value, the less likely it is that only chance is operating, and, therefore, the more likely it is that the relationship represents some systematic association between the variables being tested.
  • We do not say that we have "proved" that a relationship exists, because we can only make probabilistic statements. The size of the p-value gives us only more or less confidence that the relationships we observe represent something meaningful.

Next Section

Top

Questions? Comments? Please contact jpiliavi@ssc.wisc.edu

Home

Vita

Sociology 236

Sociology 357

Sociology 647

Sociology 965

Sociology Homepage