The CHRR Database Investigator


Table of Contents

 

7.0 The COMMON Variables

The NLSY79 User's Guide (Appendix C: NLSY79 Areas of Interest) provides the following summary information on the variables assigned to the COMMON group. You can navigate to this quote by starting in the Contents window, Documents subdirectory, NLSY79 User's Guide (double-click on it to open Text/Cdbk window), go to Appendix C: NLSY79 Areas of Interest.

    Name: COMMON - Common demographic information

    Description: This area of interest contains commonly used variables from several survey years. Included are: (1) each respondent's identification number, sample type, race, sex, and date of birth; (2) identification numbers/relationship codes of other youth in the same household who were interviewed in 1979; (3) the household identification number; (4) interview-specific information including reason for non-interview, week numbers of interview date and of last interview, etc.; and (5) various employment status variables.

    Sources of Data: The variables listed in COMMON are a composite of assigned codes, created variables, and information from the 1978 screener and/or the regularly administered questionnaires.

    NLSY79 User's Guide (Appendix C: NLSY79 Areas of Interest)

From this documentation you can begin to see the scope of the COMMON variables (demographic items) and how they were created or from what sources they are comprised.

Below we examine three useful variables in the COMMON area of interest, two of which have been created for you. When a variable is created for you it saves you the time of having to create it. We will try to acquaint you with some if the most useful created variables. Learn the process of finding created variables so you can apply it when you look for them on your own. Several useful variables in the COMMON area of interest are listed in the next section in both of their formats.

Variable Formats

You will encounter two forms of reference to the variables in the NLS. The first is the Codebook format that appears in the Text/Cdbk window and depicts the variable by its Reference Number (a 'decimal' number), its Question Name, and its short Description. The codebook depicts the receptacle for the raw case datum, it is the meta-datum and is ready to receive the response it documents as soon as the survey hits the field. The Reference Number determines the order of variables in the Codebook. It is unique, 'machine-assigned' sequentially, and indicates a 'chronological' order for the entire survey across many years. Generally, the written documentation on the NLS refers to the variables using the Reference Number (either R01736. or R01736).

    Codebook Format
  • R01736.
  • [S24Q01] SAMPLE IDENTIFICATION CODE 79 INT
  • R02147.
  • [*Created] R'S RACIAL/ETHNIC COHORT FROM SCREENER 79 INT
  • R02148.
  • [*Created] SEX OF R 79 INT

The second format of reference to a variable that you will encounter is the one in the Variables List. This format appears in the Variables List window and is a hybrid that incorporates the Codebook information (Reference Number, Description, Question Name, etc.) while adding value to it. This format is one step in between the raw data and the statistical packages (SAS, SPSS, Stata Dictionary) you will use to analyse the data. Creation of the Variable List comes after that of the Codebook; it adds a (unique) Name to the variable and assigns it to various indexes for better access. The (unique) Name assigned to the variable is derived from the Reference Number of the variable without the decimal point. The Reference Number, R01736.00, becomes the Name, R0173600. Because Reference Numbers are unique, the Name of the variable in the Variables List is also a unique identifier. (You may change this Name in the statistical packages, but then there is no guarantee that the Name will be a unique identifier.) Generally, the software reference to NLS variables uses the variable Name (R0173600).

    Variable List Format
  • R0173600
  • SAMPLE IDENTIFICATION CODE 79 INT R01736.00 S24Q01 1979 COMMON
  • R0214700
  • R'S RACIAL/ETHNIC COHORT FROM SCREENER 79 INT R02147.00 *Created 1979 COMMON
  • R0214800
  • SEX OF R 79 INT R02148.00 *Created 1979 COMMON

 

7.1 SAMPLE IDENTIFICATION CODE

R01736. [S24Q01] SAMPLE IDENTIFICATION CODE 79 INT

The NLSY79 sampling design enables researchers to analyze the experiences of groups such as women, Hispanics, blacks, and the economically disadvantaged. Users can identify a respondent's sample type by using the Sample Identification Code as defined by the R01736 variable. The following documentation in the NLSY79 User's Guide clearly outlines the extent of the probability samples used in the NLSY79. When this information is used in conjunction with the Codebook 'documentation' of the variable, a clearer picture of the scope appears.

    2.1 Sample Design

    [Paragraph three:]

    Three independent probability samples comprise the NLSY79. These samples are designed to represent the entire population of youth aged 14 to 21 as of December 31, 1978, residing in the United States on January 1, 1979. The three samples are:

    1. a cross-sectional sample designed to be representative of the noninstitutionalized civilian segment of young people living in the United States in 1979 and born January 1, 1957, through December 31, 1964;
    2. a set of supplemental samples designed to oversample civilian Hispanic, black, and economically disadvantaged, non-Hispanic, non-black youth; and
    3. a military sample designed to represent the population born January 1, 1957, through December 31, 1961, serving in the military as of September 30, 1978. The inclusion of the military sample allows comparative civilian/military analyses by ensuring more than the pro rata share of cohort members in the military.
    Users can identify a respondent's sample type by using R01736. Beginning in 1986, additional information was collected about children born to female NLSY79 respondents. The child sample, when weighted, is representative of American children born to the population of women born in 1957 through 1964 and living in the United States in 1979.
    NLSY79 User's Guide Chapter 2.1 Sample Design

The SAMPLE IDENTIFICATION CODE variable, shown in the Text/Cdbk window in Figure 4, classifies each respondent into one of the three subsamples of 'cross-sectional', 'supplemental' and 'military' mentioned above. It further identifies the gender and breaks out the race/ethnicity of the respondent. It is not a created variable but rather one that is a result of a question asked in the survey. The question in particular is S24Q01 in the Screener section of the 1979 Survey. If you have the actual survey in front of you, you can locate this question by going to the Screener section (S24Q) and looking for question S24Q01.

Below is the Codebook display for the R01736 SAMPLE IDENTIFICATION CODE variable. The frequencies of the responses are located in column 1, the leftmost column. The actual code of each response (i.e., the recorded, computer answer) is located in column 2, in the middle, and the 'textual' equivalent of the code is located in the rightmost column. Below these columns are other possible responses and the total frequencies. Below them are notes and the date of the note; lead-in and the next-default questions. The previous and next questions are represented by their longitudinally unique Reference Number. If the Reference Number is blue and underlined it is hyperlinked and you can left-click on the reference number to move the Codebook page to display that variable. In the codebook, you may also find hyperlinks to documentation on the question/variable.

(Note: Lead-in and Next Default questions have different meanings in pre-computer data collection years. In the pre-CAPI (Computer-Assisted Personal Interviewing) years, 1973-1993: 'lead-in' and 'default next' meant one question back and one question forward, respectively. Skips were not included. In the CAPI years 1994-present, Lead-in and Default mean the true, logical previous and next questions, that means the actual question whose response skipped to the current question and the next question to which the survey will move unless the response generates special skipping conditions.

Figure 4 R0176 [S24Q01] SAMPLE IDENTIFICATION CODE 79 INT

The note contained in R0176 states, "SEX CODE CHANGED ON 42 CASES" does not mean that the respondents underwent a sex change during the longitudinal interview period, but rather there was a small correction for error in the screening. As explained in the NLSY79 User's Guide, Chapter 4: Topical Guide to the NLSY79, 4.16 Gender, "during screening, sex was determined by observation and asked directly of respondents only if it was "not obvious" to the interviewer. The respondent's sex, coded for R01736 and subsequently for R02148 (see variable below in Figure 6) has been changed for 45 cases; see the User Notes, 4.16 Gender section below for a list of the identification numbers of these respondents and a short description of the changes." So the sex of the respondent was collected by interviewer observation and corrected in 45 cases due to observational error.

To find out more information on the NLSY79 samples, look at the NLSY79 User's Guide "Chapter 1, 1.4 NLSY79 Samples". You can view this information online by going to the Contents window and double-clicking on the NLSY79 User's Guide. The user's guide will open in the Text/Cdbk window. Double-click on "Chapter 1" to open it, and left-click on the topic "1.4 NLSY79 Samples". (Even more detail about the samples, the sampling process, sample size, screening for the samples, etc. can be found online in Chapter 2 of the NLSY79 User's Guide.)  

7.2 RACIAL/ETHNIC COHORT

R02147.   [*Created]  R'S RACIAL/ETHNIC COHORT FROM SCREENER 79 INT

The collection of race and ethnicity information has been an important component of NLS efforts since its beginnings and is used to identify differences in labor market experiences among racial/ethnic minorities. If you are interested in a portrait of individuals and groups, then, the race, ethnicity, and nationality variables can contribute much to your research. The following quote from the NLSY79 User's Guide documents what race, ethnicity, and nationality variables are available, the origin of the variables, and whether the variables were created or the result of raw data.

    4.32 Race, Ethnicity & Nationality

    NLSY79

    "The following race and ethnicity variables are available for NLSY79 respondents: (1) a racial/ethnic variable based on the sample identification code assigned by NORC; (2) a series of self-reported ethnic origin variables collected during the 1979 survey; and (3) a set of interviewer identifications of the race of the respondent at the time of the interview. Race and ethnic origin information is also available for each household member identified during the 1978 household screening. Of related interest is a series of immigration questions, fielded in 1990, that included the collection of information on country of citizenship at the time that foreign-born respondents entered the U.S.

    Race/Ethnicity: The variable 'Racial/Ethnic Cohort from Screener' (R02147.) designates the respondent as "Hispanic," "black," or "non-black, non-Hispanic" and provides the basis for weighting NLSY79 data. This variable is collapsed from R01736., 'Sample Identification Code,' a code, e.g., "supplemental male black," "cross-sectional female Hispanic," assigned by NORC to each respondent based on information gathered during the 1978 household screening. In the creation of the 'Sample Identification Code' and thus the 'Racial/Ethnic Cohort' variable, both race and ethnic origin information collected at the time of the 1978 household screening were used. Interviewers conducting the screening were instructed to: (1) code race by observation into three categories, "non-black/non-Hispanic," "black," or "other"; (2) inquire about the ethnicity of all household members age 14 or above; but (3) assign ethnicity, without asking, to those members who were under age 14."

    NLSY79 User's Guide Chapter 4.32 Race, Ethnicity & Nationality

The codebook display for "R02147. [*Created] R'S RACIAL/ETHNIC COHORT FROM SCREENER 79 INT" is shown below.

Figure 5 RACIAL/ETHNIC COHORT FROM 79 SCREENER

In the above display of R02147 you will notice that 1. the variable was created, and 2. a note appears below the frequencies. This note is reproduced below and it describes the logic of how the variable was created and identifies the variables from which it was derived. The exclamation (!) is used in the place of the 'pipe' (|) symbol and stands for the Boolean 'OR' operation.

    NOTE: AUGUST 26, 1980 DENNIS GREY

    DESCRIPTION:
    COHORT=3;
    IF R(1736.)=4 ! R(1736.)=8 ! R(1736.)=11 ! R(1736.)=14 THEN COHORT=1;
    IF R(1736.)=17 ! R(1736.)=20 THEN COHORT=1;
    IF R(1736.)=3 ! R(1736.)=7 ! R(1736.)=10 ! R(1736.)=13 THEN COHORT=2;
    IF R(1736.)=16 ! R(1736.)=19 THEN COHORT=2;
    R(2147.)=COHORT;

Below the note, hyperlinks from the precedent question (lead in) and to the default next question are displayed in the form of 'variable Reference Numbers'.

 

7.3 GENDER/SEX OF RESPONDENT

R02148.   [*Created]   SEX OF R 79 INT

"Variables available within the main NLSY79 data set provide information on the sex of each respondent, their children, and members of their household. Information on the sex of the respondent can be found in: (1) a single 1979 variable, 'Sex of R' (R02148.), and (2) a set of yearly interviewer remarks variables, 'Int Remarks - Sex of R' (editor's remark: R6433600 has only one occurrence in 1998; use the Area of Interest, INTRMK, to look at all the interviewers' remarks). The 1979 'Sex of R' variable (R02148) is derived from R01736, 'Sample Identification Code,' a variable which defines each respondent's membership in one of the subsamples of the NLSY79 (e.g., "cross-sectional male, non-black, non-Hispanic poor," "supplemental female black," etc.). Subsample identification was based on information gathered during the 1978 household screening." (From NLSY79 User's Guide, "Chapter 4.16 Gender".)
Figure 6 SEX OF R '79 INT (Sex of the respondent in the 1979 interview)

During screening, sex was determined by observation and asked directly of respondents only if it was "not obvious" to the interviewer. The respondent's sex, coded for R01736. and subsequently for R02148., has been changed for 45 cases; see the "User Notes" section below for a list of the identification numbers of these respondents and a short description of the changes. The variable series 'Int Remarks - Sex of R' provides observations of interviewers as to the sex of the respondent for the 1982 survey year and each following year except the 1987 telephone interview. These observations are subject to a small degree of error from erroneous interviewer observation and/or recoding and data entry error. Therefore, when using this series of variables, a small number of respondents may appear to "change" sex across surveys.

In the next section we discuss the second area of interest that the user should be well aware of, the KEYVARS. The key variables (keyvars) are variables that have been created for the user and contain such variables as age, residence, employment status, weeks worked, highest grade achieved, marital status, family size, total family income, poverty level & status, sampling weight, and so on.


Table of Contents

7-common-vars.html