Data Dictionary - Census 2000 - Summary File 3 (SF 3) - Sample Data

Data Dictionary:

Census 2000

you are here: choose a survey survey data set table details

Survey: Census 2000

Data Source:

U.S. Census Bureau

Data set: Summary File 3 (SF3)

Table:

PCT62H. Age By Language Spoken At Home By Ability To Speak English For The Population 5+ Years (Hispanic Or Latino) [22]

Universe: Hispanic or Latino population 5 years and over

Table Details

PCT62H.

Age By Language Spoken At Home By Ability To Speak English For The Population 5+ Years (Hispanic Or Latino)

Universe: Hispanic or Latino population 5 years and over

Variable	Label
PCT062H001	Total
PCT062H002	5 to 17 years
PCT062H003	Speak only English
PCT062H004	Speak other languages
PCT062H005	Speak English "very well"
PCT062H006	Speak English "well"
PCT062H007	Speak English "not well"
PCT062H008	Speak English "not at all"
PCT062H009	18 to 64 years
PCT062H010	Speak only English
PCT062H011	Speak other languages
PCT062H012	Speak English "very well"
PCT062H013	Speak English "well"
PCT062H014	Speak English "not well"
PCT062H015	Speak English "not at all"
PCT062H016	65 years and over
PCT062H017	Speak only English
PCT062H018	Speak other languages
PCT062H019	Speak English "very well"
PCT062H020	Speak English "well"
PCT062H021	Speak English "not well"
PCT062H022	Speak English "not at all"

Relevant Documentation:

Excerpt from:	Social Explorer, U.S. Census Bureau; 2000 Census of Population and Housing, Summary File 3: Technical Documentation, 2002.
	Summary File 3 Technical Documentation -> Appendix B. Definitons of Subject Characteristics -> Population Characteristics -> Age

Age

The data on age, which was asked of all people, were derived from answers to the long-form questionnaire Item 4 and short-form questionnaire Item 6. The age classification is based on the age of the person in complete years as of April 1, 2000. The age of the person usually was derived from their date of birth information. Their reported age was used only when date of birth information was unavailable.

Data on age are used to determine the applicability of some of the sample questions for a person and to classify other characteristics in census tabulations. Age data are needed to interpret most social and economic characteristics used to plan and examine many programs and policies. Therefore, age is tabulated by single years of age and by many different groupings, such as 5-year age groups.

Median age

Median age divides the age distribution into two equal parts: one-half of the cases falling below the median age and one-half above the median. Median age is computed on the basis of a single year of age standard distribution (see the "Standard Distributions" section under "Derived Measures"). Median age is rounded to the nearest tenth. (For more information on medians, see "Derived Measures".)

Limitation of the data

The most general limitation for many decades has been the tendency of people to overreport ages or years of birth that end in zero or 5. This phenomenon is called "age heaping." In addition, the counts in the 1970 and 1980 censuses for people 100 years old and over were substantially overstated. So also were the counts of people 69 years old in 1970 and 79 years old in 1980. Improvements have been made since then in the questionnaire design and in the imputation procedures that have minimized these problems.

Review of detailed 1990 census information indicated that respondents tended to provide their age as of the date of completion of the questionnaire, not their age as of April 1, 1990. One reason this happened was that respondents were not specifically instructed to provide their age as of April 1, 1990. Another reason was that data collection efforts continued well past the census date. In addition, there may have been a tendency for respondents to round their age up if they were close to having a birthday. It is likely that approximately 10 percent of people in most age groups were actually 1 year younger. For most single years of age, the misstatements were largely offsetting. The problem is most pronounced at age zero because people lost to age 1 probably were not fully offset by the inclusion of babies born after April 1, 1990. Also, there may have been more rounding up to age 1 to avoid reporting age as zero years. (Age in complete months was not collected for infants under age 1.)

The reporting of age 1 year older than true age on April 1, 1990, is likely to have been greater in areas where the census data were collected later in calendar year 1990. The magnitude of this problem was much less in the 1960, 1970, and 1980 censuses where age was typically derived from respondent data on year of birth and quarter of birth.

These shortcomings were minimized in Census 2000 because age was usually calculated from exact date of birth and because respondents were specifically asked to provide their age as of April 1, 2000. (For more information on the design of the age question, see the section below that discusses "Comparability.")

Comparability

Age data have been collected in every census. For the first time since 1950, the 1990 data were not available by quarter year of age. This change was made so that coded information could be obtained for both age and year of birth. In 2000, each individual has both an age and an exact date of birth. In each census since 1940, the age of a person was assigned when it was not reported. In censuses before 1940, with the exception of 1880, people of unknown age were shown as a separate category. Since 1960, assignment of unknown age has been performed by a general procedure described as "imputation." The specific procedures for imputing age have been different in each census. (For more information on imputation, see "Accuracy of the Data.")

Excerpt from:	Social Explorer, U.S. Census Bureau; 2000 Census of Population and Housing, Summary File 3: Technical Documentation, 2002.
	Summary File 3 Technical Documentation -> Appendix B. Definitons of Subject Characteristics -> Population Characteristics -> Language Spoken at Home and Ability to Speak English -> Language Spoken at Home

Language Spoken at Home

Data on language spoken at home were derived from answers to long-form questionnaire Items 11a and 11b, which were asked of a sample of the population. Data were edited to include in tabulations only the population 5 years old and over. Questions 11a and 11b referred to languages spoken at home in an effort to measure the current use of languages other than English. People who knew languages other than English but did not use them at home or who only used them elsewhere were excluded. Most people who reported speaking a language other than English at home also speak English. The questions did not permit determination of the primary or dominant language of people who spoke both English and another language. (For more information, see discussion below on "Ability to Speak English.")

Instructions to enumerators and questionnaire assistance center staff stated that a respondent should mark "Yes" in Question 11a if the person sometimes or always spoke a language other than English at home. Also, respondents were instructed not to mark "Yes" if a language other than English was spoken only at school or work, or if speaking another language was limited to a few expressions or slang of the other language. For Question 11b, respondents were instructed to print the name of the non-English language spoken at home. If the person spoke more than one language other than English, the person was to report the language spoken more often or the language learned first.

For people who indicated that they spoke a language other than English at home in Question 11a, but failed to specify the name of the language in Question 11b, the language was assigned based on the language of other speakers in the household, on the language of a person of the same Spanish origin or detailed race group living in the same or a nearby area, or of a person of the same place of birth or ancestry. In all cases where a person was assigned a non-English language, it was assumed that the language was spoken at home. People for whom a language other than English was entered in Question 11b, and for whom Question 11a was blank were assumed to speak that other language at home.

The write-in responses listed in Question 11b (specific language spoken) were optically scanned or keyed onto computer files, then coded into more than 380 detailed language categories using an automated coding system. The automated procedure compared write-in responses reported by respondents with entries in a master code list, which initially contained approximately 2,000 language names, and added variants and misspellings found in the 1990 census. Each write-in response was given a numeric code that was associated with one of the detailed categories in the dictionary. If the respondent listed more than one non-English language, only the first was coded.

The write-in responses represented the names people used for languages they speak. They may not match the names or categories used by linguists. The sets of categories used are sometimes geographic and sometimes linguistic. The following table provides an illustration of the content of the classification schemes used to present language data.

Four and Thirty-Nine Group Classifications of Census 2000 Languages Spoken at Home With Illustrative Examples
Four-Group Classification	Thirty-Nine-Group Classification	Examples
Spanish	Spanish and Spanish creole	Spanish,Ladino
Other Indo-European languages	French	French,Cajun,Patois
	French Creole	Haitian Creole
	Italian
	Portuguese and Portuguese creole
	German
	Yiddish
	Other West Germanic languages	Dutch, Pennsylvania Dutch, Afrikaans
	Scandinavian languages	Danish, Norwegian, Swedish
	Greek
	Russian
	Polish
	Serbo-Croatian	Serbo-Croatian,Croatian,Serbian
	Other Slavic languages	Czech, Slovak, Ukrainian
	Armenian
	Persian
	GujaratiHindi
	Urdu
	Other Indic languages	Bengali, Marathi, Punjabi, Romany
	Other Indo-European languages	Albanian, Gaelic, Lithuanian,Rumanian
Asian and Pacific Island languages	Chinese	Cantonese,Formosan,Mandarin
	Japanese
	Korean
	Mon-Khmer,Cambodian
	Miao,Hmong
	Thai
	Laotian
	Vietnamese
	Other Asian languages	Dravidian languages (Malayalam,Telugu,Tamil),Turkish
	Tagalog
	Other Pacific Island languages	Chamorro,Hawaiian,Ilocano,Indonesian,Samoan
All other languages	Navajo
	Other Native North Americanlanguages	Apache,Cherokee,Choctaw,Dakota,Keres,Pima,Yupik
	Hungarian
	Arabic
	Hebrew
	African languages	Amharic,Ibo,Twi,Yoruba,Bantu,Swahili,Somali
	Other and unspecifiedlanguages	Syriac,Finnish,Other languagesof the Americas,not reported

Household language

In households where one or more people (5 years old and over) speak a language other than English, the household language assigned to all household members is the non-English language spoken by the first person with a non-English language in the following order: householder, spouse, parent, sibling, child, grandchild, in-laws, other relatives, stepchild, unmarried partner, housemate or roommate, and other nonrelatives. Thus, a person who speaks only English may have a non-English household language assigned to him/her in tabulations of individuals by household language.

Language density

Language density is a household measure of the number of household members who speak a language other than English at home in three categories: none, some, and all speak another language.

Limitation of the data

Some people who speak a language other than English at home may have first learned that language at school. However, these people would be expected to indicate that they spoke English "Very well." People who speak a language other than English, but do not do so at home, should have been reported as not speaking a language other than English at home.

The extreme detail in which language names were coded may give a false impression of the linguistic precision of these data. The names used by speakers of a language to identify it may reflect ethnic, geographic, or political affiliations and do not necessarily respect linguistic distinctions. The categories shown in the tabulations were chosen on a number of criteria, such as information about the number of speakers of each language that might be expected in a sample of the U.S. population.

Comparability

Information on language has been collected in every census since 1890, except 1950. The comparability of data among censuses is limited by changes in question wording, by the subpopulations to whom the question was addressed, and by the detail that was published. The same question on language was asked in 1980, 1990, and Census 2000. This question on the current language spoken at home replaced the questions asked in prior censuses on mother tongue; that is, the language other than English spoken in the persons home when he or she was a child; ones first language; or the language spoken before immigrating to the United States. The censuses of 1910-1940, 1960, and 1970 included questions on mother tongue.

A change in coding procedures from 1980 to 1990 improved accuracy of coding and may have affected the number of people reported in some of the 380 plus categories. In 1980, coding clerks supplied numeric codes for the written entries on each questionnaire using a 2,000 name reference list. In 1990, written entries were keyed, then transcribed to a computer file and matched to a computer dictionary that began with the 2,000 name list. The name list was expanded as unmatched entries were referred to headquarters specialists for resolution. In Census 2000, the written entries were transcribed by "optical character recognition" (OCR), or manually keyed when the computer could not read the entry. Then all language entries were copied to a separate computer file and matched to a master code list. The code list is the master file developed from all language unique entries on the 1990 census, and included over 55,000 entries. The computerized matching ensured that identical alphabetic entries received the same code. Unmatched entries were referred to headquarters specialists for coding. In 2000, entries were reported in about 350 of the 380 categories.

Excerpt from:	Social Explorer, U.S. Census Bureau; 2000 Census of Population and Housing, Summary File 3: Technical Documentation, 2002.
	Summary File 3 Technical Documentation -> Appendix B. Definitons of Subject Characteristics -> Population Characteristics -> Language Spoken at Home and Ability to Speak English -> Ability to Speak English

Ability to Speak English

Data on ability to speak English were derived from the answers to long-form questionnaire Item 11c, which was asked of a sample of the population. Respondents who reported that they spoke a language other than English in long-form questionnaire Item 11a were asked to indicate their ability to speak English in one of the following categories: "Very well," "Well," "Not well," or "Not at all."

The data on ability to speak English represent the persons own perception about his or her own ability or, because census questionnaires are usually completed by one household member, the responses may represent the perception of another household member. Respondents were not instructed on how to interpret the response categories in Question 11c.

People who reported that they spoke a language other than English at home, but whose ability to speak English was not reported, were assigned the English-language ability of a randomly selected person of the same age, Hispanic origin, nativity and year of entry, and language group.

Linguistic isolation

A household in which no person 14 years old and over speaks only English and no person 14 years old and over who speaks a language other than English speaks English "Very well" is classified as "linguistically isolated." In other words, a household in which all members 14 years old and over speak a non-English language and also speak English less than "Very well" (have difficulty with English) is "linguistically isolated." All the members of a linguistically isolated household are tabulated as linguistically isolated, including members under 14 years old who may speak only English.

Comparability

The current question on ability to speak English was asked for the first time in 1980. From 1890 to 1910, "Able to speak English, yes/no" was asked along with two literacy questions. In tabulations from 1980, the categories "Very well" and "Well" were combined. Data from other surveys suggested a major difference between the category "Very well" and the remaining categories. In some tabulations showing ability to speak English, people who reported that they spoke English "Very well" are presented separately from people who reported their ability to speak English as less than "Very well."

Excerpt from:	Social Explorer, U.S. Census Bureau; 2000 Census of Population and Housing, Summary File 3: Technical Documentation, 2002.
	Summary File 3 Technical Documentation -> Appendix B. Definitons of Subject Characteristics -> Population Characteristics -> Hispanic or Latino

Hispanic or Latino

The data on the Hispanic or Latino population, which was asked of all people, were derived from answers to long-form questionnaire Item 5, and short-form questionnaire Item 7. The terms "Spanish," "Hispanic origin," and "Latino" are used interchangeably. Some respondents identify with all three terms, while others may identify with only one of these three specific terms. Hispanics or Latinos who identify with the terms "Spanish," "Hispanic," or "Latino" are those who classify themselves in one of the specific Hispanic or Latino categories listed on the questionnaire - "Mexican," "Puerto Rican," or "Cuban" - as well as those who indicate that they are "other Spanish, Hispanic, or Latino." People who do not identify with one of the specific origins listed on the questionnaire but indicate that they are "other Spanish, Hispanic, or Latino" are those whose origins are from Spain, the Spanish-speaking countries of Central or South America, the Dominican Republic, or people identifying themselves generally as Spanish, Spanish-American, Hispanic, Hispano, Latino, and so on. All write-in responses to the "other Spanish/Hispanic/Latino" category were coded.

Origin can be viewed as the heritage, nationality group, lineage, or country of birth of the person or the person's parents or ancestors before their arrival in the United States. People who identify their origin as Spanish, Hispanic, or Latino may be of any race.

Some tabulations are shown by the origin of the householder. In all cases where the origin of households, families, or occupied housing units is classified as Spanish, Hispanic, or Latino, the origin of the householder is used. (For more information, see the discussion of householder under "Household Type and Relationship.")

If an individual could not provide a Hispanic origin response, their origin was assigned using specific rules of precedence of household relationship. For example, if origin was missing for a natural-born daughter in the household, then either the origin of the householder, another natural-born child, or the spouse of the householder was assigned. If Hispanic origin was not reported for anyone in the household, the origin of a householder in a previously processed household with the same race was assigned. This procedure is a variation of the general imputation procedures described in "Accuracy of the Data," and is similar to those used in 1990, except that for Census 2000, race and Spanish surnames were used to assist in assigning an origin. (For more information, see the "Comparability" section below.)

Comparability

There are two important changes to the Hispanic origin question for Census 2000. First, the sequence of the race and Hispanic origin questions for Census 2000 differs from that in 1990; in 1990, the race question preceded the Hispanic origin question. Testing prior to Census 2000 indicated that response to the Hispanic origin question could be improved by placing it before the race question without affecting the response to the race question. Second, there is an instruction preceding the Hispanic origin question indicating that respondents should answer both the Hispanic origin and the race questions. This instruction was added to give emphasis to the distinct concepts of the Hispanic origin and race questions and to emphasize the need for both pieces of information.

Furthermore, there has been a change in the processing of the Hispanic origin and race responses. In 1990, the Hispanic origin question and the race question had separate edits; therefore, although information may have been present on the questionnaire, it was not fully utilized due to the discrete nature of the edits. However, for Census 2000, there was a joint race and Hispanic origin edit which for example, made use of race responses in the Hispanic origin question to impute a race if none was given.

Reports

Reports & Data Download