COR638.DOC
November 15, 1996

Documentation for Files Describing 1970-Basis Census Occupation Characteristics


NOTE:   All of the files referred to in this document are stored in a "zip"
        file called COR638.ZIP.  This file was created using WinZip 6.0 (32 bit)
        for	Windows 95. The file COR638.ZIP and the document that you are reading
        (COR638.DOC or COR638.WP5) are stored on a 3480 Tape labelled K00449.


      The file NAM_POWERS.DAT contains data for two cross-classifications:

   1) Detailed 1970-basis Census Occupation by Race (White, Black, Other) by
   Age (16-24, 25-34, 35-44, 45-64, 65+) by Sex/Labor Force Status (All Men,
   All Women, Women Who Work Full Time, Year Round) by Income (in 13
   categories)

   2) Detailed 1970-basis Census Occupation by Race by Age by Sex/Labor Force
   Status by Education (in 10 categories).

In addition, numerous occupation categories were further classified by
industry and/or class of worker.  As a result, there are a total of 871
records for each cross- classification, where each record represents an
occupation/industry/class of worker category.  For the income
cross-classification, there are 871 x 3 (races) x 5 (ages) x 3 (sex/labor
force statuses) x 13 (income categories) = 509,535 cells; likewise, for
the education cross-classification, there are 871 x 3 (races) x 5 (ages) x
3 (sex/labor force statuses) x 10 (education categories) = 391,950 cells.
In other words, for each of the 871 occupation records there are 585
variables (cell frequencies) for income and 450 variables (cell
frequencies) for education.  NAM_POWERS.DAT thus contains 509,535 cell
frequencies for income and 391,950 cell frequencies for education.

      NAM_POWERS.DAT is laid out in an inconvenient format.  Due to
restrictions on record widths when the file was created, no records are
more than 2,040 characters wide.  Each variable (cell frequency) is 8
characters wide, so that for each of the 871 occupation records, there are
585 x 8 = 4,680 characters for income and 450 x 8 = 3,600 characters for
education.  The first half of NAM_POWERS.DAT contains the income
cross-classification, while the second half contains the education
cross-classification.  Because of the limit of 2,040 columns per record,
the income data are contained in 3 records per occupation line (2,040
characters in the first, 2,040 characters in the second, and 600
characters in the third) and the education data are contained in 2 records
per occupation line (2,040 characters in the first and 1,540 in the
second).  That is, the first 871 x 3 = 2,613 records of NAM_POWERS.DAT
contain the income cross-classification, while the subsequent 1,742
records contain the education cross-classification.  In all, then, the
file is 2,040 columns wide and 4,355 records long.

      For the income cross-classification, the first 195 variables pertain
to Whites the next 195 variables pertain to Blacks, and the final 195
variables pertain to Other Races.  Within each race group, the first 39
variables pertain to people between the ages of 16 and 24, the next 39
variables pertain to people between the ages of 25 and 34, the next 39
variables pertain to people between the ages of 35 and 44, the next 39
variables pertain to people between the ages of 45 and 64, and the final
39 variables pertain to people above the age of 64.  Within each race- and
age-specific group of 39 variables, the first 13 variables pertain to all
men, the next 13 variables pertain to all women, and the final 13
variables pertain to women who worked 50-52 weeks and 35 hours or more per
week.  Finally, within each race-, age-, and gender/labor force- specific
group of 13 variables, the 13 variables give the number of people who
earned less than $1,000, between $1,000 and $1,999, between $2,000 and
$2,999, between $3,000 and $3,999, between $4,000 and $4,999, between
$5,000 and $5,999, between $6,000 and $7,999, between $8,000 and $9,999,
between $10,000 and $11,999, between $12,000 and $14,999, between $15,000
and $19,999, between $20,000 and $24,999, and above $25,000, respectively.

      For the education cross-classification, the first 150 variables
pertain to Whites, the next 150 variables pertain to Blacks, and the final
150 variables pertain to Other Races.  Within each race group, the first
30 variables pertain to people between the ages of 16 and 24, the next 30
variables pertain to people between the ages of 25 and 34, the next 30
variables pertain to people between the ages of 35 and 44, the next 30
variables pertain to people between the ages of 45 and 64, and the final
30 variables pertain to people above the age of 64.  Within each race- and
age-specific group of 30 variables, the first 10 variables pertain to all
men, the next 10 variables pertain to all women, and the final 10
variables pertain to women who worked 50-52 weeks and 35 hours or more per
week.  Finally, within each race-, age-, and gender/labor force- specific
group of 10 variables, the 10 variables give the number of people who
completed between 0 and 4 years of school, between 5 and 7 years of
school, exactly 8 years of school, between 1 and 3 years of high school,
exactly 4 years of high school, exactly 1 year of college, exactly 2 years
of college, exactly 3 years of college, exactly 4 years of college, and 5
or more years of college, respectively.

      After reading the file, I split into two smaller files: one for
income and one for education.  As per the above specifications, the income
file had 2,613 records and the education file had 1,742 records.  The
income file is called INCOME.DAT and the education file is called
EDUC.DAT.  The file INCOME.SPS is an SPSS command file which reads
INCOME.DAT; collapses over age categories; computes the total number
individuals in each of the 871 occupation categories; computes the total
number of Whites, Blacks, other race individuals, men, women, and
full-time, year-round employed women in each of the 871 occupation
categories; computes the median income, the proportion of people who
earned less than $3,000, and the proportion of people who earned at least
$10,000 for each race, sex, and labor force group for each of the 871
occupation categories; and writes an ASCII file called INCOME.OUT which
contains all of these data.  Likewise, the file EDUC.SPS is an SPSS
command file which reads EDUC.DAT; collapses over age categories; computes
the total number individuals in each of the 871 occupation categories;
computes the total number of Whites, Blacks, other race individuals, men,
women, and full-time, year-round employed women in each of the 871
occupation categories; computes the median education, the proportion of
people who completed 11 or fewer years of school, and the proportion of
people who completed at least one year of college for each race, sex, and
labor force group for each of the 871 occupation categories; and creates
an ASCII file called EDUC.OUT which contains all of these data.

      I then read INCOME.OUT and EDUC.OUT into a Lotus 1-2-3 Spreadsheet
which, after some formatting, is now called EDINC70.WK4.  I took certain
precautions to verify the accuracy of the data in EDINC70.WK4.  First, I
determined that the total number of people in each occupation category in
INCOME.OUT matched the total number of people in each occupation category
in EDUC.OUT.  Second, I verified that within each occupation category, the
sum of the numbers of Whites, Blacks, and Others equaled the total number
of people in that occupation category and that the sum of the numbers of
men and women equaled the total number of people in that category.  Third,
I compared the number of occupational incumbents and their median
education and income figures to an external source, and found that the
numbers agreed.  Finally, since I had essentially produced the
occupational income and occupational education data used by Stevens and
Featherman to produce their SEI scores, I was able to successfully
reproduce their MSEI2 and TSEI2 scores.

      The data in EDINC70.WK4 are quirky in several respects.  First, the
producers of the original documentation are not clear about the nature of
their class of worker distinctions.  In some cases, they list specific
class of worker codes (listed at the top of teh spreadsheet), while other
times they distinguish between public and private or living in and
living out. In these cases, the user will have to determine how to make
use of the class of worker distinctions.  Second, a few occupation lines
that appear in the 1970 Census Occupational Classification do not appear
in this file (or the original data files).  Third, a number of occupation
lines appear in the spreadsheet (and the original data files) which do not
appear in the 1970 Census Occupational Classification; these categories
are labeled as containing cases which were allocated.  Finally, in some
instances, there are no individuals in certain cells.  For instance, there
are no individuals in industry 389 (footwear, except rubber) and
occupation 695 (not specified operatives), despite the fact that that
classification has its own category.  More commonly, there are no other
race individuals or women in certain occupation categories.

Constructing Scores for Use With WLS (or Other) Data

      Because of these inconvenient characteristics of the data in their
original format, I have constructed a set of revised files which can be
used more easily with 1970-basis industry, occupation, and class of worker
codes.  These data are contained in a Lotus 1-2-3 speadsheet called
EDINC70R.WK4.  This file differs from the original spreadsheet
(EDINC70.WK4) in several important respects:

The original file contains information for occupation lines which are
identified as "allocation categories" or as categories for people who
have no codable occupational experience.  These occupation lines are:
196, 246, 296, 396, 586, 696, 726, 806, 846, 976, 986, and 991.
Since these occupation categories were not used in the WLS occupational
coding procedures, I have omitted them from the revised data file.
The original file contained an occupation category labeled, "333/383."
In the Census coding materials these lines appear separately, and the
WLS data contain people in both categories.  Consequently, I split
"333/383" into two separate lines (333 and 383).  The number of people
in each line cannot be ascertained.  However, I gave each of the new
lines the occupational characteristics which were originally attributed
to 333/383. Similarly, the original data contained an occupation line
labeled, "571-575."  I split this line into 3 separate lines:
571, 572, and 575.  Again, I do not know how many people held each of
these occupations, but I assigned each the occupational characteristics
attributed to "571-575" in the original file. The original file
contained no listing for occupation 280.  It did, however, contain
listing for occupations 281 through 285.  These lines represent sales
workers in different industries.  I constructed a category for occupation
280, but also retained the lines 281 through 285.  I determined the
occupational characteristics of line 280 by taking a weighted average
of the occupational characteristics of lines 281 through 285.

In the original file, occupation 692 is split by industry.  One resulting
classification is 389-692 (industry = 389, occupation = 692); however,
the original file reports that nobody fell into that category, and so no
occupational characteristics are reported.  In the WLS, on the other hand,
a few people were assigned the code 389-692.  In order to assign occupational
characteristics to people with the code 389-692, I omitted that particular
occupation category.  The result is that people in 389-692 fall into the
line for "occupation 692, all other industries." In the original file,
occupation lines 215 (inspectors, except construction, public administration)
and 222 (officials and administrators; public administration, n.e.c.)
are split by class of worker such that federal, state, and local government
employees are in different categories.  Since the WLS class of worker variables do
not differentiate between levels of government, I collapsed the lines so that there is
now one line for occupation 215 and one line for occupation 222.  The
occupational characteristics of the new lines were computed by taking weighted
averages of the constituent lines.

For occupation lines 980 (child care workers, private households), 981 (cooks, private
household), 982 (housekeepers, private household), 983 (laundresses, private
household), and 984 (maids and servants, private household), the original file
distinguished between people who lived in their own homes and those who lived in
the homes of their employers.  I have collapsed the "lived in" and "lived out"
categories into single lines, using the weighted averages of the occupational
characteristics of the separate lines as the occupational charcateristics of the
combined lines. Finally, I converted the class of worker codes in the
original file to those used in WLS.  Specifically: 0 (Wage/Salary Worker in
Private Company) became 1 (Wage/Salary Worker in Private Company);
1,2, and 3 (Federal, State, and Local Government, respectively) became 2
(Government); 4 (Self-Employed, not Incorporated) remained the same;
5 (Self-Employed, Incorporated) became 3 (Self Employed, Incorporated);
and 6 (Unpaid Family Worker) became 5 (Unpaid Family Worker).

Also, TOTAL.CTL, WHITE.CTL, BLACK.CTL, OTHER.CTL, MEN.CTL, ALLWOMEN.CTL,
and FTFYWOM.CTL are command files that are written in SPSS but that can
easily be translated into SAS or other program codes.  Using TOTAL.CTL,
for example, a researcher can map the education and earnings scores which
are based on the entire population onto specific 1970-basis industry,
occupation, and class of worker codes.