Tables

Turn the numbers from the computer printout into tables

Except for the correlation matrix that is part of the reliability analysis, you will be required to make new tables, rather than simply xeroxing the output you get from the computer. This is to indicate to me that you KNOW what numbers you are looking at to answer the various questions in the report. It would be useful to bring "blank" tables for your variables set up like those in the example to class on the date indicated, so that you can copy your data into them. Tables should be numbered sequentially so you can refer to them in your report, should have titles, and should include labels for the variables so that a person can understand what the numbers in the table are without having to read the rest of your paper. BE SURE TO DOUBLE- AND TRIPLE-CHECK YOUR COPYING OF NUMBERS FROM THE PRINTOUT INTO THE TABLES after the class.

Inspect Your Data

Frequency distributions. There are several useful things to inspect the frequencies for.

Problems of low variability. If 80% or more of the respondents chose one answer for an item, the variability of that variable is so low that statistical results with it are extremely problematic given small samples sizes. Results from such variables cannot be trusted, and they are often simply discarded.
Lesser problems of low variability, where 60% or more of the cases choose one response, or where 80% or more of the cases are in two adjacent responses (of a variable with more than three responses). These may just indicate a skew in the population, but might also indicate a biased sample or a biased question.
Comparisons across the different questions measuring the dependent variable, to see which items elicit the most favorable responses, and which the least favorable.
Information about the distribution of attitudes in the population. Because your samples are non-random, you have to interpret these results very cautiously, but it is still interesting to find out what people said.

Reliability analysis and correlations among the questions measuring the dependent variable. Ideally, the reliability analysis should show all of the corrected item-total correlations (the correlation of each item with the sum of all the other items) to be moderately high and positive (better than .4 or .5). and the coefficient alphas to be over .7. Also, for each item, the output should state that removing the item would lead to a lower alpha. If all of these things are true, your items are all good.
If you don't have the ideal, there are several possible patterns to check for in the correlation matrix itself:

One or two questions show low negative correlations and small positive ones with the other items; these questions have a low item-total correlation and an indication that alpha will go UP if they are removed. This probably means those questions are "bad" and the others are OK. You discard the bad questions and get a revised index using only the good ones.
There are subsets of questions that have moderate or strong positive correlations with each other but negative or weak correlations with the others. This means the questions seem to be tapping more than one conceptual variable. You try to see which set most captures what you had in mind and use those for your index. Occasionally, you decide that both sets are interesting, and create two indices, one for each.
There are a lot of strong negative correlations for one or two variables, and their item-total correlations are strongly negative. This usually means that you have forgotten to "reverse score" a question, or that subjects read the opposite meaning into a question than you intended. SEE ME.
Most correlations in the table are close to zero, and negative ones are scattered across different questions, alphas and item-total correlations are low. This means that none of the items are properly related to any of the others, that there really is no single concept that your questions measure. This is the worst situation to be in, but it is rare. Most often, this turns out to be due to mistakes in coding the data, usually when partners code their data separately and turn out to be doing it differently. If you have data like this and you have checked for and ruled out coding errors, you definitely need to see me.

Look to see if the means of the INDEX get larger in the expected direction across categories of the OPEN-ENDED question, and that the analysis of variance test is significant at p < .01 or smaller. This should occur if the INDEX and the OPEN-ENDED question are both "valid" measures of your concept. If this does not happen, you try to figure out which measure is "bad." Keep in mind that the open-ended question might have a problem, or that the open-ended categories may be out of order.
Tests of the "obvious hypothesis using the index will employ either correlation analysis (if your independent variable is continuous) or analyses of variance (if I.V. is categorical). Tests of the hypothesis using the open-ended question will use chi-squares. Look to see if the hypothesis is confirmed or not. You may get

significant findings using both measures,
nothing using either measure, or
significant findings with one measure and not with the other.

Again, there are implications for the validity of each of these measures from these results. We will discuss how you interpret these in class.

The following are a set of tables, based on the student questionnaire presented a few pages back. In class you will be given the actual printouts from these data, just like the printouts you will later receive for your own data. After the tables, there is a guide to the output, titled "Numbers are our friends."

EXAMPLE TABLES for Univariate statistics. The computer printout provides a lot of statistics. The first few pages go into Tables 1 and 2.

Table 1. Frequency Distributions for Independent Variables
(I show only four items for brevity of presentation)

4sex31	freq	party4	freq	relig5	freq	libcon9	freq
male	30	Repub	22	Catholic	18	Ext. Liberal	1
female	29	Dem	16	Jewish	1	2	10
		Prog	3	Protestant	14	3	10
		Ind	9	Buddhist	1	4	1
		Green	1	Hindu	1	5	6
		Libertar	1	Deist	2	6	9
		Other	7	None	19	7	5
				Other	3	8	5
						9	7
						Ext. Conserv.	1

Note: The computer prints only values of the variable that have been chosen by respondents in the frequency tables. You should put in all values because it helps you to compare the questions in terms of where people feel most favorable or unfavorable. Note that the tables do NOT report a mean for the open-ended question, or the independent variables, even though the computer printed them, because in most cases it would be meaningless.

Table 2. Frequency Distributions And Means for Dependent Variables.
(I present only the first four variables and the OPEN code for brevity)

deter11	f	state12	f	safe13	f	cand14	f	OPEN1	f
1	11	1	2	1	26	1	5	strongly opopose	22
2	19	2	8	2	6	2	11	mod. oppose	5
3	5	3	11	3	6	3	9	ambig.	8
4	8	4	7	4	5	4	6	mod. favor	10
5	12	5	19	5	9	5	9	strongly favor	14
6	4	6	12	6	7	6	19

mean	3.05		4.17		2.76		4.02		(inapp.)

Reliability analysis; correlations among the closed-ended questions. The computer next printed out a triangular correlation matrix, which you copy into Table 3. (You may paste the matrix, as I have here, but you must provide a title.)

Table 3: Correlations among closed-ended questions and with the total scale

	deter11	safe13	cand14	retrib15	wrong17	cheap18
deter11	1.0000
safe13	.613	1.0000
cand14	.505	.648	1.0000
retrib15	.533	.522	.609	1.0000
wrong17	.470	.627	.560	.559	1.0000
cheap18	.307	.516	.550	.561	.477	1.0000
fed19	.085	.249	.213	-.151	-.083	-.098
moral20	.669	.743	.714	.737	.692	.676
final21	.660	.623	.604	.710	.638	.523
apply22	.548	.511	.476	.380	.543	.262

	fed19	moral20	final21	apply22
fed19	1.0000
moral20	.054	1.0000
final21	-.023	.846	1.0000
apply22	.158	.494	.546	1.0000

N of Cases= 59.0

Item-total Statistics

	Scale Mean If Item Deleted	Scale Variance if Item Deleted	Corrected Item-Total Correlation	Squared Multiple Correlation	Alpha if Item Deleted
deter11	29.1844	135.506	.680	.583	.892
safe13	29.4725	126.763	.784	.669	.884
cand14	28.2183	130.151	.758	.610	.886
retrib15	28.7683	128.557	.693	.632	.891
wrong17	28.6420	135.123	.698	.601	.891
cheap18	29.7268	138.933	.582	.547	.898
fed19	29.2692	159.576	.057	.332	.924
moral20	29.3878	124.042	.895	.865	.877
final21	29.0827	127.008	.808	.767	.883
apply22	29.3653	137.301	.592	.467	.897

Reliability Analysis - Scale (Alpha)

Reliability Coefficients 10 Items

Cronback's Alpha=.903, Standardized Item Alpha=.898

We will discuss in class how you would interpret these numbers and what are their implications for validity.

Bivariate association between open-ended question and index. The computer gives you the mean for the index separately for each category of the open-ended question.

Table 4. Mean of the Index for Each Category of the Open-ended Question.

OPEN1	N	Mean	Standard Deviation	f	p-value
Strongly Oppose	22	21.65	5.547
Mod. Oppose	5	25.00	5.788	33.45	.001
Amabiguous	8	30.94	13.157
Mod. Favor	10	37.10	6.280
Strongly Favor	14	48.71	5.470

Tests of "obvious" hypothesis and other hypotheses.

Table 5 a. Mean of Index Separately by Political Party Preference (Obvious hypothesis).

partycat	Mean Index	N	Standard Dev.	F	p-value
Repub. or libertarian	41.55	23	10.93	8.872	.001
Democrat	27.00	16	12.23
Prog. or green	18.58	4	3.47
Independ	24.11	9	6.57
Other	31.86	7	9.05

Table 5 b. Mean of Index Separately by Religion

religcat	Mean Index	N	Standard Dev.	F	p-value
Catholic	35.22	18	10.47	4.937	.004
Protestant	40.00	14	13.83
None	25.47	19	10.93
Deist, Jewish, Hindu, Buddhist, Other	28	8	11.90

Table 6 a. Test of obvious hypothesis using open-ended question. Percentages of subjects in each category of the open-ended coding categories (collapsed) in relationship to political party (further collapsed).

	OPEN (grouped)
Party (trichotomy)	Codes 1-2 (Oppose)	Code 3 (Ambiguous)	Code 4-5 (Favor)
Repub. + Libertar (23)	13% (3)	43.5% (10)	43.5% (10)
Democrat(16)	56.3% (9)	25% (4)	45% (9)
other (20)	50% (10)	45% (9)	5% (1)
Chi-square; p-value	14.047; p< .007

Table 6 b. Test of second hypothesis, using open-ended answers. Percentage of respondents in each category of the open-ended question (collapsed) by categories of religion.

	OPEN (grouped)
Religion (four categories)	Codes 1-2 (Oppose)	Code 3 (Ambiguous)	Codes 4-5 (Support)
Catholic	27.8% (5)	50% (9)	22.2% (4)
Protestant	28.6% (4)	14.3% (2)	57.1% (8)
None	42.1% (8)	52.6% (10)	5.3% (1)
Deist, Jewish, Hindu, Buddhist, other	62.5% (5)	25% (2)	12.5% (1)
Chi-square, p-value	16.024; p <.014

*** END OF EXAMPLE***

NUMBERS ARE OUR FRIENDS: A BRIEF STATISTICS GUIDE

Some statistical concepts:

Pearson product-moment correlations, analysis of variance, and chi-square are three of many ways to calculate the likelihood that patterns of relationships between variables are unlikely to be due to chance.
Correlations can run from +1.00 through 0 to -1.00. Correlations indicate the extent to which one variable is associated with a second. Alternatively, one can see it as the extent to which you can predict one variable from the other. The larger the number, the better the prediction. The significance of a correlation -- the likelihood that it represents a "real" relationship between the two variables -- can be calculated. It is based in part on the size of the correlation and in part upon the sample size.
Analysis of variance (and t-test, a special case involving only two sample groups) tests whether it is reasonable to assume that the sample means on a continuous variable for several different groups come from different populations. That is, can we conclude with some confidence that there is something about the groups that is "really" different, as opposed to having to accept that the apparent mean differences are due to chance.
Chi-square tests whether the distribution of cases in a contingency table (a cross-tabulations of two nominal or ordinal variables) deviates from what we would expect the distribution of cases in the table would be based on chance alone.
These tests of "significance" are based on ratios that are calculated and then looked up in tables. (Nowadays, the computer does all of this for you.) You need not understand the calculations to understand the concept. What the level of significance tells you is what proportion of times you would expect to find as strong a relationship as is indicated by the data simply by the operation of chance factors. The larger the obtained ratio, and the smaller the p-value, the less likely it is that only chance is operating, and, therefore, the more likely it is that the relationship represents some systematic association between the variables being tested.
We do not say that we have "proved" that a relationship exists, because we can only make probabilistic statements. The size of the p-value gives us only more or less confidence that the relationships we observe represent something meaningful.

Next Section

Top

Questions? Comments? Please contact jpiliavi @ssc.wisc.edu