The CHRR Database Investigator


Table of Contents


11.0 Extracting Variables

We are now going to briefly demonstrate how to extract variables using the CHRR Database Investigator software. A more detailed explanation of the sophistication of the software can be found in the software manual, CHRR Database Investigator User's Manual. This manual is on your CD, link to it under the subdirectory "Documents" in the Contents window. The process of extracting variables is one that prepares the raw data from the NLS for use in one of the statistical packages, defines the output file type, writes documentation for the extracted set, writes the codebook, etc. To extract variables, you must have 'tagged' variables in a tagset. Variables are always extracted in ascending reference number order, so when an extracted data set is read into sas/spss/stata/etc., R00001 will always be first (if selected), followed by the next highest reference number, etc.

11.1 Tagging Variables

Let's extract the variables on 'age'. We have already investigated the Age variables and know that they are part of the KEYVARS area of interest. We could go ahead and extract them at this time. But, remember, you should probably always include the 'case identification' variable in all extracts, because it may help you resolve inconsistencies in a particular case. By including the case identification code you could go to any 'outlying' case and look at the particular datum for an anomaly and investigate whether to keep the case in the extract or to throw it out because of the conflicting data.

We have already investigated the 'CaseID' variable (R0000100 IDENTIFICATION CODE 79 INT) and know that it is part of the COMMON area of interest. You should probably also include two other COMMON variables, the 'race' (R0214700 R'S RACIAL/ETHNIC COHORT FROM SCREENER 79 INT) and 'sex' (R0214800 SEX OF R 79 INT) of the respondent.

So, before locating the 'age' variables, go to the Contents window and double-click on the Area of Interest index. When you double-click on the Area of Interest it will open and display the groupings of variables contained in the index in the middle frame of the Contents window. Go to the COMMON group by scrolling down to it or by typing 'co' and the highlight bar should quickly jump to the COMMON group.

Double-click on COMMON and the list of variables will open in the Variables window. The first variable in the group is R0000100. Click on the checkbox next to R0000100 to select it for extraction. Look down through the descriptions until you find 'R'S RACIAL/ETHNIC COHORT FROM SCREENER 79 INT' and click on its checkbox and below it you will find 'SEX OF R 79 INT'. Click on its checkbox. Your screen should look like the Figure 11 below.
Figure 10 Selected CaseID, Race, and Sex Variables

Now we need to select the 'Age' variables from the KEYVARS area of interest. (An alternate approach to finding all the variables on 'Age' would be to open the Any Word in Context index [Contents window, double-click on 'Any Word in Context'] and look for the word 'age' in it. If found, open the group [double-click on 'age'] in the Variables window, sort the variables by description [left-click on the Description heading] and then select the appropriate sub-group of variables from there.) But we know we want "AGE OF R AT INTERVIEW DATE ..." and that sub-group of variables is in KEYVARS, which will make our job a little easier.

Go back to the Contents window and locate the KEYVARS area of interest index by typing 'ke', to move the highlight bar to the KEYVARS group. Double-click on the KEYVARS and the list of variables will open in the Variables window. Sort these variables by description by clicking on the Description heading and scroll down through the sorted variables until you find the group beginning with 'Age ...'.

You now want to select the 'block' of variables that begin with the word 'Age...'. To select a block of variables and mark them for extraction,

Place the cursor on the first (or 'top') 'Age' variable and left-click on it to highlight it. Now hold down the key and move the cursor to the last (or 'lowest') 'Age' variable and left-click on it to highlight all the variables in between. Release the key.

When the block is highlighted as in Figure 12 below,
Place the curser on the block and right-click on it to open the pop-up menu for tagging multiple variables.

Figure 12 'Age' Variables Sorted Together

In the tagging pop-up menu move the cursor to 'Tag Selected' and left-click on it.

After choosing 'Tag Selected' the variables in the block should all have checkmark by them and your screen should look like Figure 12 above (without the tagging pop-up menu).

11.2 Reviewing Tagged Variables

Now you will want to verify that the variables you have selected are all together in a 'tag set'. To do this you will need to 'review' the tagged variables.

Go to the main menu (at the top of the screen) and left-click on the word Extract. From the drop-down menu under Extract move the cursor to 'Review Tagged Variables' and left click on it.

When you choose Review Tagged Variables the screen shown in Figure 13 below should appear. From this screen you can do a number of things: review your selections, un-check individual variables if you'd like, accept the tagset and save it as an autonomous group, or 'Extract Tagged Variables'. If you choose the latter 'Extract Tagged Variables' from the 'Extract' menu (at the top of the screen) without saving the tagset, you will automatically be asked to save the tagset and give it a name. The reason that you must name the file before extraction is that during the extraction, the software automatically creates related files for documentation, statistical packages, codebook, etc. and uses the file's name to attach the various, appropriate extensions to the created files.

To save the variables, go to File (top of the screen) and choose 'Save Tagged Set...'.

When saving a tagset the standard Windows dialogue box will appear and you will have to give the file a name. If you name this file 'age', you need only type in the word 'age' without an extension because the software will automatically add the extension '.ythpub'. All output files derived from this tagset will use this file name and attach the appropriate extension. For sample extensions, see 11.4 Extract Selections and Output Files below.

Figure 13 Review of Selected Variables.

11.3 Running an Extract

If, after review, you wish to continue and extract the variables, you may proceed in two ways: 1. You may save the tagset, then run the extract; or 2. You may run the extract and be required to name and save the tagset. To perform the extract, you must choose this command from the Extract drop-down menu.

To run the extract, go to Extract (top of the screen) and choose 'Extract Tagged Variables...'.

When you choose 'Extract Tagged Variables...' the dialogue box shown in Figure 14 will appear. From this dialogue box you may make a series of selections concerning the output files; you may limit the sample (universe) by using Boolean logic (left half of the dialogue box), choose the extract data file type, and/or write out the codebook for the selected variables in the tagset. The "Write Codebook" button writes out the codebook for the tagset without running the extract. The "Extract Codebook File" check box when checked cause the software to write out the codebook for the tagset during execution of the extract.

Figure 14 below displays the default selections that will appear each time you choose 'Extract Tagged Variables...' from the Extract. If you were to run the extract with the defaults, you would get the Extract Report shown in Figure 14 on the right.

Figure 14 Extract Dialogue Box and Corresponding Extract Report

Shown below are various iterations of selections made in the Extract Dialogue Box and the resulting Extract Reports. Study what 'extract data file types' produce what output files.

 

11.4 Extract Selections and Output Files

Formatted ASCII
If you make the following selections in the Extract dialogue box (figure on the left) before you run the extract, the output and final Extract Report will look like the figure on the right. The extracted files will be placed in the same directory as the original tagged variables (*.ythpub) file. You may open any of the files in a text editor (notepad.exe, wordpad.exe, etc.) to view the file format.

Delimited ASCII
If you make the following selections in the Extract dialogue box (figure on the left) before you run the extract, the output and final Extract Report will look like the figure on the right. The extracted files will be placed in the same directory as the original tagged variables (*.ythpub) file. You may open any of the files in a text editor (notepad.exe, wordpad.exe, etc.) to view the file format.

DBASE3
If you make the following selections in the Extract dialogue box (figure on the left) before you run the extract, the output and final Extract Report will look like the figure on the right. The extracted files will be placed in the same directory as the original tagged variables (*.ythpub) file. You may open any of the files in a text editor (notepad.exe, wordpad.exe, etc.) to view the file format.

Stata Dictionary
If you make the following selections in the Extract dialogue box (figure on the left) before you run the extract, the output and final Extract Report will look like the figure on the right. The extracted files will be placed in the same directory as the original tagged variables (*.ythpub) file. You may open any of the files in a text editor (notepad.exe, wordpad.exe, etc.) to view the file format.


Table of Contents