Data Dictionary: Census 2000
you are here: choose a survey survey data set table details
Survey: Census 2000
Data Source: U.S. Census Bureau
Table: P44. Imputation Of Age [3]
Universe: Population not substituted
Table Details
P44. Imputation Of Age
Universe: Population not substituted
Variable Label
P044001
P044002
P044003
Relevant Documentation:
Excerpt from: Social Explorer, U.S. Census Bureau; 2000 Census of Population and Housing, Summary File 1: Technical Documentation, 2001.
 
Imputation
When information is missing or inconsistent, the Census Bureau uses a method called imputation to assign values. Imputation relies on the statistical principle of "homogeneity," or the tendency of households within a small geographic area to be similar in most characteristics. For example, the value of "rented" is likely to be imputed for a housing unit not reporting on owner/renter status in a neighborhood with multiunits or apartments where other respondents reported "rented" on the census questionnaire. In past censuses, when the occupancy status or the number of residents was not known for a housing unit, this information was imputed.

Internet Questionnaire Assistance (IQA)
An operation which allows respondents to use the Census Bureau's Internet site to (1) ask questions and receive answers about the census form, job opportunities, or the purpose of the census and (2) provide responses to the short form.

Interpolation
Interpolation frequently is used in calculating medians or quartiles based on interval data and in approximating standard errors from tables. Linear interpolation is used to estimate values of a function between two known values. Pareto interpolation is an alternative to linear interpolation. In Pareto interpolation, the median is derived by interpolating between the logarithms of the upper and lower income limits of the median category. It is used by the Census Bureau in calculating median income within intervals wider than $2,500.

Excerpt from: Social Explorer, U.S. Census Bureau; 2000 Census of Population and Housing, Summary File 1: Technical Documentation, 2001.
 
Age
The data on age were derived from answers to a question that was asked of all people. The age classification is based on the age of the person in complete years as of April 1, 2000. The age of the person was usually derived from their date of birth information. Their reported age was used only when date of birth information was unavailable.

Data on age are used to determine the applicability of some of the sample questions for a person and to classify other characteristics in census tabulations. Age data are needed to interpret most social and economic characteristics used to plan and examine many programs and policies.

Median age
This measure divides the age distribution into two equal parts: one-half of the cases falling below the median value and one-half above the value. Median age is computed on the basis of a single year of age distribution.

Limitation of the data
The most general limitation for many decades has been the tendency of people to overreport ages or years of birth that end in zero or five. This phenomenon is called "age heaping." In addition, the counts in the 1970 and 1980 censuses for people 100 years old and over were substantially overstated. So also were the counts of people aged 69 in 1970 and aged 79 in 1980. Improvements have been made since then in the questionnaire design, and in the allocation procedures which have further minimized these problems. The count of people aged 89 in the 1990 census was not overstated.

Review of detailed 1990 census information indicated that respondents tended to provide their age as of the date they completed the questionnaire, not their age as of April 1, 1990. One reason this happened was that respondents were not specifically instructed to provide their age as of April 1, 1990. Another reason was that data collection efforts continued well past the census date. In addition, there may have been a tendency for respondents to round their age up if they were close to having a birthday. It is likely that approximately 10 percent of people in most age groups were actually one year younger. For most single years of age, the misstatements were largely offsetting. The problem is most pronounced at age zero because people lost to age one probably were not fully offset by the inclusion of babies born after April 1, 1990. Also, there may have been more rounding up to age one to avoid reporting age as zero years. (Age in complete months was not collected for infants under age one.)

The reporting of age one year older than true age on April 1, 1990, is likely to have been greater in areas where the census data were collected later in calendar year 1990. The magnitude of this problem was much less in the 1960, 1970, and 1980 censuses where age was typically derived from respondent data on year of birth and quarter of birth.

These shortcomings were minimized in Census 2000 because age was usually calculated from exact date of birth and because respondents were specifically asked to provide their age as of April 1, 2000. (For more information on the design of the age question, see the section below that discusses "Comparability.")

Comparability
Age data have been collected in every census. For the first time since 1950, the 1990 data were not available by quarter year of age. This change was made so that coded information could be obtained for both age and year of birth. In 2000, each individual has both an age and an exact date of birth. In each census since 1940, the age of a person was assigned when it was not reported. In censuses before 1940, with the exception of 1880, people of unknown age were shown as a separate category. Since 1960, assignment of unknown age has been performed by a general procedure described as "imputation." The specific procedures for imputing age have been different in each census. (For more information on imputation, see "Accuracy of the Data.")
For more information on age, please telephone 301-457-2428.

Excerpt from: Social Explorer, U.S. Census Bureau; 2000 Census of Population and Housing, Summary File 1: Technical Documentation, 2001.
 
Editing of Unacceptable Data
The objective of the processing operation was to produce a set of data that describes the population as accurately and clearly as possible. In a major change from past practice, the information on Census 2000 questionnaires generally was not edited during field data collection nor during data capture operations for consistency, completeness, and acceptability. Enumerator-filled questionnaires were reviewed by census crew leaders and local office clerks for adherence to specified procedures. No clerical review of mail return questionnaires was done to ensure that the information on the form could be data captured, nor were households contacted as in previous censuses to collect data that were missing from census returns.

Most census questionnaires received by mail from respondents as well as those filled by enumerators were processed through a new contractor-built image scanning system that used optical mark and character recognition to convert the responses into computer files. The optical character recognition, or OCR, process used several pattern and context checks to estimate accuracy thresholds for each write-in field. The system also used "soft edits" on most interpreted numeric write-in responses to decide whether the field values read by the machine interpretation were acceptable. If the value read had a lower than acceptable accuracy threshold or was outside of the soft edit range, the image of the item was displayed to a keyer, who then entered the response.

To control the creation of possibly erroneous people from questionnaires completed incorrectly or containing stray marks, an edit on the number of people indicated on each mail return and enumerator-filled questionnaire was implemented as part of the data capture system. Failure of this edit resulted in the review of the questionnaire image at a workstation by an operator, that identified erroneous person records and corrected OCR interpretation errors in the population count field.

At Census Bureau headquarters, the mail response data records were subjected to a computer edit that identified households exhibiting a possible coverage problem and those with more than six household members-the maximum number of persons who could be enumerated on a mail questionnaire. Attempts were made to contact these households on the telephone to correct the count inconsistency and to collect the census data for those people for whom there was no room on the questionnaire.

Incomplete or inconsistent information on the questionnaire data records was assigned acceptable values using imputation procedures during the final automated edit of the collected data. Imputations, or computer assignments of acceptable codes in place of unacceptable entries or blanks, are needed most often when an entry for a given item is lacking or when the information reported for a person on that item is inconsistent with other information for that person. This process is known as allocation. As in previous censuses, the general procedure for changing unacceptable entries was to assign an entry for a person that was consistent with entries for persons with similar characteristics. The assignment of acceptable codes in place of blanks or unacceptable entries enhances the usefulness of the data. Allocation rates for census items are made available with the published census data.

Another way corrections were made during the computer editing process was through substitution; that is, the assignment of a full set of characteristics for people in a household. When there was an indication that a household was occupied by a specified number of people, but the questionnaire contained no information for the people within the household or the occupants were not listed on the questionnaire, a previously accepted household of the same size was selected as a substitute, and the full set of characteristics for the substitute was duplicated. Housing characteristics are not substituted. Matrix H18, Occupied Housing Units Substituted, represents a count of occupied housing units into which all persons have been substituted.