* Conditional computation. * Note this is condional computation of data values, not computation of statistics on conditional subsets. See the material on subsets for that. * Setup from last script. get file="y:\spss\data\employee data.sav". compute age=datediff(date.mdy(1,20,1995),bdate,"year"). * The simplest conditional data computation is done with the "if" command. * Suppose we are convinced that women earn 95% of what men earn, everything else being equal. We want to calculate an as-if-male salary. compute mwage=salary. if (gender="f") mwage = salary/0.95. means variables=salary mwage by gender. * Notes: "if" is a transformation command. The parentheses around the logical condition are required. There is no "then". * Another example. For many modeling commands we need indicator variables that are numeric. So we may want to convert gender to some numeric variable. if (gender = "f") women = 1. if (gender = "m") women = 0. oneway salary by women. * Note that this is even easier to compute as. compute men = (gender = "m"). oneway salary by men. * Each "if" command stands on its own, and this causes two kinds of problems. (1) It is a little inefficient to repeatedly check the value of a variable. (2) The order of your if commands can be very important. descriptives variables=age. if (age < 30) agecat=2. frequencies variables=agecat. if (age < 40) agecat=3. if (age < 50) agecat=4. if (age < 60) agecat=5. if (age < 70) agecat=6. frequencies variables=agecat. * Each "if" overwrites the data from the previous "if"!!. * We can fix this by reversing the order of the statements, or by writing stricter conditions. if (age < 30) agecat=2. if (age >29 & age < 40) agecat=3. if (age >39 & age < 50) agecat=4. if (age >49 & age < 60) agecat=5. if (age >=60 & age < 70) agecat=6. frequencies variables=agecat /histogram. * do if - else if allows us to build a cascading sequence of conditional computations. Only one of these, the first true condition, gets executed for each observation. * The logic is tighter and it is more efficient. The only down side is that for simple problems if requires more typing. do if ( age < 30). + compute agecat2=2. * You could have more than one compute here. else if (age < 40). + compute agecat2=3. * The "+" signs are for compatability with non-Windows operating systems. else if (age < 50). + compute agecat2=4. else if (age <= 59). + compute agecat2=5. else if (age <= 69). + compute agecat2=6. else. * Everything else, including missing values. - compute agecat2=99. end if. frequencies variables=agecat2 /histogram. if (missing(age)) agecat3=99. execute. frequencies variables=agecat3. compute agecat4 = \$sysmis. execute. recode age (37, 38, 39 = 3) (40 thru 49 = 4) (50 thru 59 = 5) (60 thru 69 = 6) (70 thru 79 = 7) (else = 99) into agecat4. execute. formats agecat4 (f2.0). missing values age agecat agecat2 agecat3 agecat4 (99). frequencies variables=agecat4. missing values agecat4 (). descriptives variables=agecat4. missing values agecat4 (99). descriptives variables=agecat4. variable labels age "Age (years)" agecat4 "Age categories (10yr bins)". descriptives variables=age. frequencies variables=agecat4. value labels agecat4 3 "30-39yrs old" 4 "40-49yrs old" 99 "(No birth date)". frequencies variables=agecat4. value labels agecat4 agecat3 99 "(No birth date)". add value labels agecat4 3 "30-39yrs old" 4 "40-49yrs old".