Census 2000
Survey: Census 2000
Data Source: U.S. Census Bureau
Table: P45. Imputation Of Relationship [3]
Universe: Population not substituted
P45. Imputation Of Relationship
Universe: Population not substituted
Excerpt from: Social Explorer, U.S. Census Bureau; 2000 Census of Population and Housing, Summary File 1: Technical Documentation, 2001.
When information is missing or inconsistent, the Census Bureau uses a method called imputation to assign values. Imputation relies on the statistical principle of "homogeneity," or the tendency of households within a small geographic area to be similar in most characteristics. For example, the value of "rented" is likely to be imputed for a housing unit not reporting on owner/renter status in a neighborhood with multiunits or apartments where other respondents reported "rented" on the census questionnaire. In past censuses, when the occupancy status or the number of residents was not known for a housing unit, this information was imputed.

Internet Questionnaire Assistance (IQA)
An operation which allows respondents to use the Census Bureau's Internet site to (1) ask questions and receive answers about the census form, job opportunities, or the purpose of the census and (2) provide responses to the short form.

Interpolation frequently is used in calculating medians or quartiles based on interval data and in approximating standard errors from tables. Linear interpolation is used to estimate values of a function between two known values. Pareto interpolation is an alternative to linear interpolation. In Pareto interpolation, the median is derived by interpolating between the logarithms of the upper and lower income limits of the median category. It is used by the Census Bureau in calculating median income within intervals wider than $2,500.

Editing of Unacceptable Data
The objective of the processing operation was to produce a set of data that describes the population as accurately and clearly as possible. In a major change from past practice, the information on Census 2000 questionnaires generally was not edited during field data collection nor during data capture operations for consistency, completeness, and acceptability. Enumerator-filled questionnaires were reviewed by census crew leaders and local office clerks for adherence to specified procedures. No clerical review of mail return questionnaires was done to ensure that the information on the form could be data captured, nor were households contacted as in previous censuses to collect data that were missing from census returns.

Most census questionnaires received by mail from respondents as well as those filled by enumerators were processed through a new contractor-built image scanning system that used optical mark and character recognition to convert the responses into computer files. The optical character recognition, or OCR, process used several pattern and context checks to estimate accuracy thresholds for each write-in field. The system also used "soft edits" on most interpreted numeric write-in responses to decide whether the field values read by the machine interpretation were acceptable. If the value read had a lower than acceptable accuracy threshold or was outside of the soft edit range, the image of the item was displayed to a keyer, who then entered the response.

To control the creation of possibly erroneous people from questionnaires completed incorrectly or containing stray marks, an edit on the number of people indicated on each mail return and enumerator-filled questionnaire was implemented as part of the data capture system. Failure of this edit resulted in the review of the questionnaire image at a workstation by an operator, that identified erroneous person records and corrected OCR interpretation errors in the population count field.

At Census Bureau headquarters, the mail response data records were subjected to a computer edit that identified households exhibiting a possible coverage problem and those with more than six household members-the maximum number of persons who could be enumerated on a mail questionnaire. Attempts were made to contact these households on the telephone to correct the count inconsistency and to collect the census data for those people for whom there was no room on the questionnaire.

Incomplete or inconsistent information on the questionnaire data records was assigned acceptable values using imputation procedures during the final automated edit of the collected data. Imputations, or computer assignments of acceptable codes in place of unacceptable entries or blanks, are needed most often when an entry for a given item is lacking or when the information reported for a person on that item is inconsistent with other information for that person. This process is known as allocation. As in previous censuses, the general procedure for changing unacceptable entries was to assign an entry for a person that was consistent with entries for persons with similar characteristics. The assignment of acceptable codes in place of blanks or unacceptable entries enhances the usefulness of the data. Allocation rates for census items are made available with the published census data.

Another way corrections were made during the computer editing process was through substitution; that is, the assignment of a full set of characteristics for people in a household. When there was an indication that a household was occupied by a specified number of people, but the questionnaire contained no information for the people within the household or the occupants were not listed on the questionnaire, a previously accepted household of the same size was selected as a substitute, and the full set of characteristics for the substitute was duplicated. Housing characteristics are not substituted. Matrix H18, Occupied Housing Units Substituted, represents a count of occupied housing units into which all persons have been substituted.