Data Dictionary: Census 1990
Survey: Census 1990
Data Source: U.S. Census Bureau
Table: P142. Imputation Of Ancestry [3]
Universe: Persons
Variable Label
Relevant Documentation:
Excerpt from: Social Explorer, U.S. Census Bureau; Census of Population and Housing, 1990: Summary Tape File 3 on CD-ROM [machine-readable data files] / prepared by the Bureau of the Census. Washington: The Bureau [producer and distributor], 1991.
Confidentiality of the Data
To maintain the confidentiality required by law (Title 13, United States Code), the Bureau of the Census applies a confidentiality edit to the 1990 census data to assure that published data do not disclose information about specific individuals, households, or housing units. As a result, a small amount of uncertainty is introduced into the estimates of census characteristics. The sample itself provides adequate protection for most areas for which sample data are published since the resulting data are estimates of the actual counts; however, small areas require more protection. The edit is controlled so that the basic structure of the data is preserved.

The confidentiality edit is implemented by selecting a small subset of individual households from the internal sample data files and blanking a subset of the data items on these household records. Responses to those data items were then imputed using the same imputation procedures that were used for nonresponse. A larger subset of households is selected for the confidentiality edit for small areas to provide greater protection for these areas. The editing process is implemented in such a way that the quality and usefulness of the data were preserved.

Editing of Unacceptable Data
The objective of the processing operation is to produce a set of data that describes the population as accurately and clearly as possible. To meet this objective, questionnaires were edited during field data collection operations for consistency, completeness, and acceptability. Questionnaires also were reviewed by census clerks for omissions, certain specific inconsistencies, and population coverage. For example, write-in entries such as Dont know or NA were considered unacceptable. For some district offices, the initial edit was automated; however, for the majority of the district offices, it was performed by clerks. As a result of this operation, a telephone or personal visit followup was made to obtain missing information. Potential coverage errors were included in the followup, as well as a sample of questionnaires with omissions and/or inconsistencies. Subsequent to field operations, remaining incomplete or inconsistent information on the questionnaires was assigned using imputation procedures during the final automated edit of the collected data. Imputations, or computer assignments of acceptable codes in place of unacceptable entries or blanks, are needed most often when an entry for a given item is lacking or when the information reported for a person or housing unit on that item is inconsistent with other information for that same person or housing unit. As in previous censuses, the general procedure for changing unacceptable entries was to assign an entry for a person or housing unit that was consistent with entries for persons or housing units with similar characteristics. The assignment of acceptable codes in place of blanks or unacceptable entries enhances the usefulness of the data.

Another way in which corrections were made during the computer editing process was through substitution; that is, the assignment of a full set of characteristics for a person or housing unit. When there was an indication that a housing unit was occupied but the questionnaire contained no information for the people within the household or the occupants were not listed on the questionnaire, a previously accepted household was selected as a substitute, and the full set of characteristics for the substitute was duplicated. The assignment of the full set of housing characteristics occurred when there was no housing information available. If the housing unit was determined to be occupied, the housing characteristics were assigned from a previously processed occupied unit. If the housing unit was vacant, the housing characteristics were assigned from a previously processed vacant unit.

The data on ancestry were derived from answers to questionnaire item 13, which was asked of a sample of persons. The question was based on self-identification; the data on ancestry represent self-classification by people according to the ancestry group(s) with which they most closely identify. Ancestry refers to a person's ethnic origin or descent, "roots," or heritage or the place of birth of the person or the person's parents or ancestors before their arrival in the United States. Some ethnic identities, such as "Egyptian" or "Polish" can be traced to geographic areas outside the United States, while other ethnicities such as "Pennsylvania Dutch" or "Cajun" evolved in the United States.

The intent of the ancestry question was not to measure the degree of attachment the respondent had to a particular ethnicity. For example, a response of "Irish" might reflect total involvement in an "Irish" community or only a memory of ancestors several generations removed from the individual.

The Census Bureau coded the responses through an automated review, edit, and coding operation. The open-ended write-in ancestry item was coded by subject-matter specialists into a numeric representation using a code list containing over 1,000 categories. The 1990 code list reflects the results of the Census Bureau's own research and consultations with many ethnic experts. Many decisions were made to determine the classification of responses. These decisions affected the grouping of the tabulated data. For example, the "Assyrian" category includes both responses of "Assyrian" and "Chaldean."

The ancestry question allowed respondents to report one or more ancestry groups. While a large number of respondents listed a single ancestry, the majority of answers included more than one ethnic entry. Generally, only the first two responses reported were coded in 1990. If a response was in terms of a dual ancestry, for example, Irish-English, the person was assigned two codes, in this case one for Irish and another for English.

However, in certain cases, multiple responses such as "French Canadian," "Scotch-Irish," "Greek Cypriote," and "Black Dutch" were assigned a single code reflecting their status as unique groups. If a person reported one of these unique groups in addition to another group, for example, "Scotch-Irish English," resulting in three terms, that person received one code for the unique group ("Scotch-Irish") and another one for the remaining group ("English"). If a person reported "English Irish French," only English and Irish were coded. Certain combinations of ancestries where the ancestry group is a part of another, such as "German- Bavarian," the responses were coded as a single ancestry using the smaller group ("Bavarian"). Also, responses such as "Polish-American" or "Italian-American" were coded and tabulated as a single entry ("Polish" or "Italian").

The Census Bureau accepted "American" as a unique ethnicity if it was given alone, with an ambiguous response, or with State names. If the respondent listed any other ethnic identity such as "Italian American," generally the "American" portion of the response was not coded. However, distinct groups such as "American Indian," "Mexican American," and "African American" were coded and identified separately because they represented groups who considered themselves different from those who reported as "Indian," "Mexican," or "African," respectively.

In all tabulations, when respondents provided an unacceptable ethnic identity (for example, an uncodeable or unintelligible response such as "multi-national," "adopted," or "I have no idea"), the answer was included in "Ancestry not reported."

The tabulations on ancestry are presented using two types of data presentations--one used total persons as the base, and the other used total responses as the base. The following are categories shown in the two data presentations:

Presentation Based on Persons
Single Ancestries Reported
Includes all persons who reported only one ethnic group. Included in this category are persons with multiple-term responses such as "Scotch-Irish" who are assigned a single code. Multiple Ancestries Reported--Includes all persons who reported more than one group and were assigned two ancestry codes.
Ancestry Unclassified
Includes all persons who provided a response that could not be assigned an ancestry code because they provided nonsensical entries or religious responses.

Presentations Based on Responses
Total Ancestries Reported
Includes the total number of ancestries reported and coded. If a person reported a multiple ancestry such as "French Danish," that response was counted twice in the tabulations--once in the "French" category and again in the "Danish" category. Thus, the sum of the counts in this type of presentation is not the total population but the total of all responses.

First Ancestry Reported
Includes the first response of all persons who reported at least one codeable entry. For example, in this category, the count for "Danish" would include all those who reported only Danish and those who reported Danish first and then some other group.

Second Ancestry Reported
Includes the second response of all persons who reported a multiple ancestry. Thus, the count for "Danish" in this category includes all persons who reported Danish as the second response, regardless of the first response provided.

The Census Bureau identified hundreds of ethnic groups in the 1990 census. However, it was impossible to show information for every group in all census tabulations because of space constraints. Publications such as the 1990 CP-2, Social and Economic Characteristics and the 1990 CPH-3, Population and Housing Characteristics for Census Tracts and Block Numbering Areas reports show a limited number of groups based on the number reported and the advice received from experts. A more complete distribution of groups is presented in the 1990 Summary Tape File 4, supplementary reports, and a special subject report on ancestry. In addition, groups identified specifically in the questions on race and Hispanic origin (for example, Japanese, Laotian, Mexican, Cuban, and Spaniard), in general, are not shown separately in ancestry tabulations.

Limitation of the Data
Although some experts consider religious affiliation a component of ethnic identity, the ancestry question was not designed to collect any information concerning religion. The Bureau of the Census is prohibited from collecting information on religion. Thus, if a religion was given as an answer to the ancestry question, it was coded as an "Other" response.

A question on ancestry was first asked in the 1980 census. Although there were no comparable data prior to the 1980 census, related information on ethnicity was collected through questions on parental birthplace, own birthplace, and language which were included in previous censuses. Unlike other census questions, there was no imputation for nonresponse to the ancestry question.

In 1990, respondents were allowed to report more than one ancestry group; however, only the first two ancestry groups identified were coded. In 1980, the Census Bureau attempted to code a third ancestry for selected triple-ancestry responses.

New categories such as "Arab" and "West Indian" were added to the 1990 question to meet important data needs. The "West Indian" category excluded "Hispanic" groups such as "Puerto Rican" and "Cuban" that were identified primarily through the question on Hispanic origin. In 1990, the ancestry group, "American" is recognized and tabulated as a unique ethnicity. In 1980, "American" was tabulated but included under the category "Ancestry not specified."

A major improvement in the 1990 census was the use of an automated coding system for ancestry responses. The automated coding system used in the 1990 census greatly reduced the potential for error associated with a clerical review. Specialists with a thorough knowledge of the subject matter reviewed, edited, coded, and resolved inconsistent or incomplete responses.