COR638.DOC November 15, 1996 Documentation for Files Describing 1970-Basis Census Occupation Characteristics NOTE: All of the files referred to in this document are stored in a "zip" file called COR638.ZIP. This file was created using WinZip 6.0 (32 bit) for Windows 95. The file COR638.ZIP and the document that you are reading (COR638.DOC or COR638.WP5) are stored on a 3480 Tape labelled K00449. The file NAM_POWERS.DAT contains data for two cross-classifications: 1) Detailed 1970-basis Census Occupation by Race (White, Black, Other) by Age (16-24, 25-34, 35-44, 45-64, 65+) by Sex/Labor Force Status (All Men, All Women, Women Who Work Full Time, Year Round) by Income (in 13 categories) 2) Detailed 1970-basis Census Occupation by Race by Age by Sex/Labor Force Status by Education (in 10 categories). In addition, numerous occupation categories were further classified by industry and/or class of worker. As a result, there are a total of 871 records for each cross- classification, where each record represents an occupation/industry/class of worker category. For the income cross-classification, there are 871 x 3 (races) x 5 (ages) x 3 (sex/labor force statuses) x 13 (income categories) = 509,535 cells; likewise, for the education cross-classification, there are 871 x 3 (races) x 5 (ages) x 3 (sex/labor force statuses) x 10 (education categories) = 391,950 cells. In other words, for each of the 871 occupation records there are 585 variables (cell frequencies) for income and 450 variables (cell frequencies) for education. NAM_POWERS.DAT thus contains 509,535 cell frequencies for income and 391,950 cell frequencies for education. NAM_POWERS.DAT is laid out in an inconvenient format. Due to restrictions on record widths when the file was created, no records are more than 2,040 characters wide. Each variable (cell frequency) is 8 characters wide, so that for each of the 871 occupation records, there are 585 x 8 = 4,680 characters for income and 450 x 8 = 3,600 characters for education. The first half of NAM_POWERS.DAT contains the income cross-classification, while the second half contains the education cross-classification. Because of the limit of 2,040 columns per record, the income data are contained in 3 records per occupation line (2,040 characters in the first, 2,040 characters in the second, and 600 characters in the third) and the education data are contained in 2 records per occupation line (2,040 characters in the first and 1,540 in the second). That is, the first 871 x 3 = 2,613 records of NAM_POWERS.DAT contain the income cross-classification, while the subsequent 1,742 records contain the education cross-classification. In all, then, the file is 2,040 columns wide and 4,355 records long. For the income cross-classification, the first 195 variables pertain to Whites the next 195 variables pertain to Blacks, and the final 195 variables pertain to Other Races. Within each race group, the first 39 variables pertain to people between the ages of 16 and 24, the next 39 variables pertain to people between the ages of 25 and 34, the next 39 variables pertain to people between the ages of 35 and 44, the next 39 variables pertain to people between the ages of 45 and 64, and the final 39 variables pertain to people above the age of 64. Within each race- and age-specific group of 39 variables, the first 13 variables pertain to all men, the next 13 variables pertain to all women, and the final 13 variables pertain to women who worked 50-52 weeks and 35 hours or more per week. Finally, within each race-, age-, and gender/labor force- specific group of 13 variables, the 13 variables give the number of people who earned less than $1,000, between $1,000 and $1,999, between $2,000 and $2,999, between $3,000 and $3,999, between $4,000 and $4,999, between $5,000 and $5,999, between $6,000 and $7,999, between $8,000 and $9,999, between $10,000 and $11,999, between $12,000 and $14,999, between $15,000 and $19,999, between $20,000 and $24,999, and above $25,000, respectively. For the education cross-classification, the first 150 variables pertain to Whites, the next 150 variables pertain to Blacks, and the final 150 variables pertain to Other Races. Within each race group, the first 30 variables pertain to people between the ages of 16 and 24, the next 30 variables pertain to people between the ages of 25 and 34, the next 30 variables pertain to people between the ages of 35 and 44, the next 30 variables pertain to people between the ages of 45 and 64, and the final 30 variables pertain to people above the age of 64. Within each race- and age-specific group of 30 variables, the first 10 variables pertain to all men, the next 10 variables pertain to all women, and the final 10 variables pertain to women who worked 50-52 weeks and 35 hours or more per week. Finally, within each race-, age-, and gender/labor force- specific group of 10 variables, the 10 variables give the number of people who completed between 0 and 4 years of school, between 5 and 7 years of school, exactly 8 years of school, between 1 and 3 years of high school, exactly 4 years of high school, exactly 1 year of college, exactly 2 years of college, exactly 3 years of college, exactly 4 years of college, and 5 or more years of college, respectively. After reading the file, I split into two smaller files: one for income and one for education. As per the above specifications, the income file had 2,613 records and the education file had 1,742 records. The income file is called INCOME.DAT and the education file is called EDUC.DAT. The file INCOME.SPS is an SPSS command file which reads INCOME.DAT; collapses over age categories; computes the total number individuals in each of the 871 occupation categories; computes the total number of Whites, Blacks, other race individuals, men, women, and full-time, year-round employed women in each of the 871 occupation categories; computes the median income, the proportion of people who earned less than $3,000, and the proportion of people who earned at least $10,000 for each race, sex, and labor force group for each of the 871 occupation categories; and writes an ASCII file called INCOME.OUT which contains all of these data. Likewise, the file EDUC.SPS is an SPSS command file which reads EDUC.DAT; collapses over age categories; computes the total number individuals in each of the 871 occupation categories; computes the total number of Whites, Blacks, other race individuals, men, women, and full-time, year-round employed women in each of the 871 occupation categories; computes the median education, the proportion of people who completed 11 or fewer years of school, and the proportion of people who completed at least one year of college for each race, sex, and labor force group for each of the 871 occupation categories; and creates an ASCII file called EDUC.OUT which contains all of these data. I then read INCOME.OUT and EDUC.OUT into a Lotus 1-2-3 Spreadsheet which, after some formatting, is now called EDINC70.WK4. I took certain precautions to verify the accuracy of the data in EDINC70.WK4. First, I determined that the total number of people in each occupation category in INCOME.OUT matched the total number of people in each occupation category in EDUC.OUT. Second, I verified that within each occupation category, the sum of the numbers of Whites, Blacks, and Others equaled the total number of people in that occupation category and that the sum of the numbers of men and women equaled the total number of people in that category. Third, I compared the number of occupational incumbents and their median education and income figures to an external source, and found that the numbers agreed. Finally, since I had essentially produced the occupational income and occupational education data used by Stevens and Featherman to produce their SEI scores, I was able to successfully reproduce their MSEI2 and TSEI2 scores. The data in EDINC70.WK4 are quirky in several respects. First, the producers of the original documentation are not clear about the nature of their class of worker distinctions. In some cases, they list specific class of worker codes (listed at the top of teh spreadsheet), while other times they distinguish between public and private or living in and living out. In these cases, the user will have to determine how to make use of the class of worker distinctions. Second, a few occupation lines that appear in the 1970 Census Occupational Classification do not appear in this file (or the original data files). Third, a number of occupation lines appear in the spreadsheet (and the original data files) which do not appear in the 1970 Census Occupational Classification; these categories are labeled as containing cases which were allocated. Finally, in some instances, there are no individuals in certain cells. For instance, there are no individuals in industry 389 (footwear, except rubber) and occupation 695 (not specified operatives), despite the fact that that classification has its own category. More commonly, there are no other race individuals or women in certain occupation categories. Constructing Scores for Use With WLS (or Other) Data Because of these inconvenient characteristics of the data in their original format, I have constructed a set of revised files which can be used more easily with 1970-basis industry, occupation, and class of worker codes. These data are contained in a Lotus 1-2-3 speadsheet called EDINC70R.WK4. This file differs from the original spreadsheet (EDINC70.WK4) in several important respects: The original file contains information for occupation lines which are identified as "allocation categories" or as categories for people who have no codable occupational experience. These occupation lines are: 196, 246, 296, 396, 586, 696, 726, 806, 846, 976, 986, and 991. Since these occupation categories were not used in the WLS occupational coding procedures, I have omitted them from the revised data file. The original file contained an occupation category labeled, "333/383." In the Census coding materials these lines appear separately, and the WLS data contain people in both categories. Consequently, I split "333/383" into two separate lines (333 and 383). The number of people in each line cannot be ascertained. However, I gave each of the new lines the occupational characteristics which were originally attributed to 333/383. Similarly, the original data contained an occupation line labeled, "571-575." I split this line into 3 separate lines: 571, 572, and 575. Again, I do not know how many people held each of these occupations, but I assigned each the occupational characteristics attributed to "571-575" in the original file. The original file contained no listing for occupation 280. It did, however, contain listing for occupations 281 through 285. These lines represent sales workers in different industries. I constructed a category for occupation 280, but also retained the lines 281 through 285. I determined the occupational characteristics of line 280 by taking a weighted average of the occupational characteristics of lines 281 through 285. In the original file, occupation 692 is split by industry. One resulting classification is 389-692 (industry = 389, occupation = 692); however, the original file reports that nobody fell into that category, and so no occupational characteristics are reported. In the WLS, on the other hand, a few people were assigned the code 389-692. In order to assign occupational characteristics to people with the code 389-692, I omitted that particular occupation category. The result is that people in 389-692 fall into the line for "occupation 692, all other industries." In the original file, occupation lines 215 (inspectors, except construction, public administration) and 222 (officials and administrators; public administration, n.e.c.) are split by class of worker such that federal, state, and local government employees are in different categories. Since the WLS class of worker variables do not differentiate between levels of government, I collapsed the lines so that there is now one line for occupation 215 and one line for occupation 222. The occupational characteristics of the new lines were computed by taking weighted averages of the constituent lines. For occupation lines 980 (child care workers, private households), 981 (cooks, private household), 982 (housekeepers, private household), 983 (laundresses, private household), and 984 (maids and servants, private household), the original file distinguished between people who lived in their own homes and those who lived in the homes of their employers. I have collapsed the "lived in" and "lived out" categories into single lines, using the weighted averages of the occupational characteristics of the separate lines as the occupational charcateristics of the combined lines. Finally, I converted the class of worker codes in the original file to those used in WLS. Specifically: 0 (Wage/Salary Worker in Private Company) became 1 (Wage/Salary Worker in Private Company); 1,2, and 3 (Federal, State, and Local Government, respectively) became 2 (Government); 4 (Self-Employed, not Incorporated) remained the same; 5 (Self-Employed, Incorporated) became 3 (Self Employed, Incorporated); and 6 (Unpaid Family Worker) became 5 (Unpaid Family Worker). Also, TOTAL.CTL, WHITE.CTL, BLACK.CTL, OTHER.CTL, MEN.CTL, ALLWOMEN.CTL, and FTFYWOM.CTL are command files that are written in SPSS but that can easily be translated into SAS or other program codes. Using TOTAL.CTL, for example, a researcher can map the education and earnings scores which are based on the entire population onto specific 1970-basis industry, occupation, and class of worker codes.