* Conditional computation.
* Note this is condional computation of data values, not
computation of statistics on conditional subsets. See
the material on subsets for that.
* Setup from last script.
get file="y:\spss\data\employee data.sav".
compute age=datediff(date.mdy(1,20,1995),bdate,"year").
* The simplest conditional data computation is done with the "if" command.
* Suppose we are convinced that women earn 95% of what men earn, everything
else being equal. We want to calculate an as-if-male salary.
compute mwage=salary.
if (gender="f") mwage = salary/0.95.
means variables=salary mwage by gender.
* Notes: "if" is a transformation command. The parentheses around
the logical condition are required. There is no "then".
* Another example. For many modeling commands we need indicator
variables that are numeric. So we may want to convert gender to
some numeric variable.
if (gender = "f") women = 1.
if (gender = "m") women = 0.
oneway salary by women.
* Note that this is even easier to compute as.
compute men = (gender = "m").
oneway salary by men.
* Each "if" command stands on its own, and this causes two
kinds of problems. (1) It is a little inefficient to repeatedly check
the value of a variable. (2) The order of your if commands
can be very important.
descriptives variables=age.
if (age < 30) agecat=2.
frequencies variables=agecat.
if (age < 40) agecat=3.
if (age < 50) agecat=4.
if (age < 60) agecat=5.
if (age < 70) agecat=6.
frequencies variables=agecat.
* Each "if" overwrites the data from the previous "if"!!.
* We can fix this by reversing the order of the statements,
or by writing stricter conditions.
if (age < 30) agecat=2.
if (age >29 & age < 40) agecat=3.
if (age >39 & age < 50) agecat=4.
if (age >49 & age < 60) agecat=5.
if (age >=60 & age < 70) agecat=6.
frequencies variables=agecat
/histogram.
* do if - else if allows us to build a cascading
sequence of conditional computations. Only
one of these, the first true condition, gets
executed for each observation.
* The logic is tighter and it is more efficient. The
only down side is that for simple problems
if requires more typing.
do if ( age < 30).
+ compute agecat2=2.
* You could have more than one compute here.
else if (age < 40).
+ compute agecat2=3.
* The "+" signs are for compatability with non-Windows operating systems.
else if (age < 50).
+ compute agecat2=4.
else if (age <= 59).
+ compute agecat2=5.
else if (age <= 69).
+ compute agecat2=6.
else.
* Everything else, including missing values.
- compute agecat2=99.
end if.
frequencies variables=agecat2
/histogram.
if (missing(age)) agecat3=99.
execute.
frequencies variables=agecat3.
compute agecat4 = $sysmis.
execute.
recode age (37, 38, 39 = 3) (40 thru 49 = 4) (50 thru 59 = 5) (60 thru 69 = 6) (70 thru 79 = 7) (else = 99) into agecat4.
execute.
formats agecat4 (f2.0).
missing values age agecat agecat2 agecat3 agecat4 (99).
frequencies variables=agecat4.
missing values agecat4 ().
descriptives variables=agecat4.
missing values agecat4 (99).
descriptives variables=agecat4.
variable labels age "Age (years)" agecat4 "Age categories (10yr bins)".
descriptives variables=age.
frequencies variables=agecat4.
value labels agecat4 3 "30-39yrs old" 4 "40-49yrs old" 99 "(No birth date)".
frequencies variables=agecat4.
value labels agecat4 agecat3 99 "(No birth date)".
add value labels agecat4 3 "30-39yrs old" 4 "40-49yrs old".