Data Dictionary: Census 2000
Survey: Census 2000
Data Source: U.S. Census Bureau
Table: P98. Imputation Of Age [3]
Universe: Total population
Excerpt from: Social Explorer, U.S. Census Bureau; 2000 Census of Population and Housing, Summary File 3: Technical Documentation, 2002.
When information is missing or inconsistent, the Census Bureau uses a method called imputation to assign values. Imputation relies on the statistical principle of "homogeneity," or the tendency of households within a small geographic area to be similar in most characteristics. For example, the value of "rented" is likely to be imputed for a housing unit not reporting on owner/renter status in a neighborhood with multiunits or apartments where other respondents reported "rented" on the census questionnaire. In past censuses, when the occupancy status or the number of residents was not known for a housing unit, this information was imputed.

Internet Questionnaire Assistance (IQA)
An operation which allows respondents to use the Census Bureau's Internet site to (1) ask questions and receive answers about the census form, job opportunities, or the purpose of the census and (2) provide responses to the short form.

Interpolation frequently is used in calculating medians or quartiles based on interval data and in approximating standard errors from tables. Linear interpolation is used to estimate values of a function between two known values. Pareto interpolation is an alternative to linear interpolation. In Pareto interpolation, the median is derived by interpolating between the logarithms of the upper and lower income limits of the median category. It is used by the Census Bureau in calculating median income within intervals wider than $2,500.

Excerpt from: Social Explorer, U.S. Census Bureau; 2000 Census of Population and Housing, Summary File 3: Technical Documentation, 2002.
The data on age, which was asked of all people, were derived from answers to the long-form questionnaire Item 4 and short-form questionnaire Item 6. The age classification is based on the age of the person in complete years as of April 1, 2000. The age of the person usually was derived from their date of birth information. Their reported age was used only when date of birth information was unavailable.

Data on age are used to determine the applicability of some of the sample questions for a person and to classify other characteristics in census tabulations. Age data are needed to interpret most social and economic characteristics used to plan and examine many programs and policies. Therefore, age is tabulated by single years of age and by many different groupings, such as 5-year age groups.

Median age
Median age divides the age distribution into two equal parts: one-half of the cases falling below the median age and one-half above the median. Median age is computed on the basis of a single year of age standard distribution (see the "Standard Distributions" section under "Derived Measures"). Median age is rounded to the nearest tenth. (For more information on medians, see "Derived Measures".)

Limitation of the data
The most general limitation for many decades has been the tendency of people to overreport ages or years of birth that end in zero or 5. This phenomenon is called "age heaping." In addition, the counts in the 1970 and 1980 censuses for people 100 years old and over were substantially overstated. So also were the counts of people 69 years old in 1970 and 79 years old in 1980. Improvements have been made since then in the questionnaire design and in the imputation procedures that have minimized these problems.

Review of detailed 1990 census information indicated that respondents tended to provide their age as of the date of completion of the questionnaire, not their age as of April 1, 1990. One reason this happened was that respondents were not specifically instructed to provide their age as of April 1, 1990. Another reason was that data collection efforts continued well past the census date. In addition, there may have been a tendency for respondents to round their age up if they were close to having a birthday. It is likely that approximately 10 percent of people in most age groups were actually 1 year younger. For most single years of age, the misstatements were largely offsetting. The problem is most pronounced at age zero because people lost to age 1 probably were not fully offset by the inclusion of babies born after April 1, 1990. Also, there may have been more rounding up to age 1 to avoid reporting age as zero years. (Age in complete months was not collected for infants under age 1.)

The reporting of age 1 year older than true age on April 1, 1990, is likely to have been greater in areas where the census data were collected later in calendar year 1990. The magnitude of this problem was much less in the 1960, 1970, and 1980 censuses where age was typically derived from respondent data on year of birth and quarter of birth.

These shortcomings were minimized in Census 2000 because age was usually calculated from exact date of birth and because respondents were specifically asked to provide their age as of April 1, 2000. (For more information on the design of the age question, see the section below that discusses "Comparability.")

Age data have been collected in every census. For the first time since 1950, the 1990 data were not available by quarter year of age. This change was made so that coded information could be obtained for both age and year of birth. In 2000, each individual has both an age and an exact date of birth. In each census since 1940, the age of a person was assigned when it was not reported. In censuses before 1940, with the exception of 1880, people of unknown age were shown as a separate category. Since 1960, assignment of unknown age has been performed by a general procedure described as "imputation." The specific procedures for imputing age have been different in each census. (For more information on imputation, see "Accuracy of the Data.")