Data Dictionary - Census 2000 - Summary File 3 (SF 3) - Sample Data

Data Dictionary:

Census 2000

you are here: choose a survey survey data set table details

Survey: Census 2000

Data Source:

U.S. Census Bureau

Data set: Summary File 3 (SF3)

Table:

P109. Imputation Of Language Spoken At Home For The Population 5+ Years [7]

Universe: Population 5 years and over

Table Details

P109.

Imputation Of Language Spoken At Home For The Population 5+ Years

Universe: Population 5 years and over

Variable	Label
P109001	Total
P109002	Speak only English
P109003	Speak other languages
P109004	Imputed
P109005	Language status imputed
P109006	Language status not imputed
P109007	Not imputed

Relevant Documentation:

Excerpt from:	Social Explorer, U.S. Census Bureau; 2000 Census of Population and Housing, Summary File 3: Technical Documentation, 2002.
	Summary File 3 Technical Documentation -> Appendix C. Data Collection and Processing Procedures -> Glossary -> Imputation

Imputation

When information is missing or inconsistent, the Census Bureau uses a method called imputation to assign values. Imputation relies on the statistical principle of "homogeneity," or the tendency of households within a small geographic area to be similar in most characteristics. For example, the value of "rented" is likely to be imputed for a housing unit not reporting on owner/renter status in a neighborhood with multiunits or apartments where other respondents reported "rented" on the census questionnaire. In past censuses, when the occupancy status or the number of residents was not known for a housing unit, this information was imputed.

Internet Questionnaire Assistance (IQA)

An operation which allows respondents to use the Census Bureau's Internet site to (1) ask questions and receive answers about the census form, job opportunities, or the purpose of the census and (2) provide responses to the short form.

Interpolation

Interpolation frequently is used in calculating medians or quartiles based on interval data and in approximating standard errors from tables. Linear interpolation is used to estimate values of a function between two known values. Pareto interpolation is an alternative to linear interpolation. In Pareto interpolation, the median is derived by interpolating between the logarithms of the upper and lower income limits of the median category. It is used by the Census Bureau in calculating median income within intervals wider than $2,500.

Excerpt from:	Social Explorer, U.S. Census Bureau; 2000 Census of Population and Housing, Summary File 3: Technical Documentation, 2002.
	Summary File 3 Technical Documentation -> Appendix B. Definitons of Subject Characteristics -> Population Characteristics -> Language Spoken at Home and Ability to Speak English -> Language Spoken at Home

Language Spoken at Home

Data on language spoken at home were derived from answers to long-form questionnaire Items 11a and 11b, which were asked of a sample of the population. Data were edited to include in tabulations only the population 5 years old and over. Questions 11a and 11b referred to languages spoken at home in an effort to measure the current use of languages other than English. People who knew languages other than English but did not use them at home or who only used them elsewhere were excluded. Most people who reported speaking a language other than English at home also speak English. The questions did not permit determination of the primary or dominant language of people who spoke both English and another language. (For more information, see discussion below on "Ability to Speak English.")

Instructions to enumerators and questionnaire assistance center staff stated that a respondent should mark "Yes" in Question 11a if the person sometimes or always spoke a language other than English at home. Also, respondents were instructed not to mark "Yes" if a language other than English was spoken only at school or work, or if speaking another language was limited to a few expressions or slang of the other language. For Question 11b, respondents were instructed to print the name of the non-English language spoken at home. If the person spoke more than one language other than English, the person was to report the language spoken more often or the language learned first.

For people who indicated that they spoke a language other than English at home in Question 11a, but failed to specify the name of the language in Question 11b, the language was assigned based on the language of other speakers in the household, on the language of a person of the same Spanish origin or detailed race group living in the same or a nearby area, or of a person of the same place of birth or ancestry. In all cases where a person was assigned a non-English language, it was assumed that the language was spoken at home. People for whom a language other than English was entered in Question 11b, and for whom Question 11a was blank were assumed to speak that other language at home.

The write-in responses listed in Question 11b (specific language spoken) were optically scanned or keyed onto computer files, then coded into more than 380 detailed language categories using an automated coding system. The automated procedure compared write-in responses reported by respondents with entries in a master code list, which initially contained approximately 2,000 language names, and added variants and misspellings found in the 1990 census. Each write-in response was given a numeric code that was associated with one of the detailed categories in the dictionary. If the respondent listed more than one non-English language, only the first was coded.

The write-in responses represented the names people used for languages they speak. They may not match the names or categories used by linguists. The sets of categories used are sometimes geographic and sometimes linguistic. The following table provides an illustration of the content of the classification schemes used to present language data.

Four and Thirty-Nine Group Classifications of Census 2000 Languages Spoken at Home With Illustrative Examples
Four-Group Classification	Thirty-Nine-Group Classification	Examples
Spanish	Spanish and Spanish creole	Spanish,Ladino
Other Indo-European languages	French	French,Cajun,Patois
	French Creole	Haitian Creole
	Italian
	Portuguese and Portuguese creole
	German
	Yiddish
	Other West Germanic languages	Dutch, Pennsylvania Dutch, Afrikaans
	Scandinavian languages	Danish, Norwegian, Swedish
	Greek
	Russian
	Polish
	Serbo-Croatian	Serbo-Croatian,Croatian,Serbian
	Other Slavic languages	Czech, Slovak, Ukrainian
	Armenian
	Persian
	GujaratiHindi
	Urdu
	Other Indic languages	Bengali, Marathi, Punjabi, Romany
	Other Indo-European languages	Albanian, Gaelic, Lithuanian,Rumanian
Asian and Pacific Island languages	Chinese	Cantonese,Formosan,Mandarin
	Japanese
	Korean
	Mon-Khmer,Cambodian
	Miao,Hmong
	Thai
	Laotian
	Vietnamese
	Other Asian languages	Dravidian languages (Malayalam,Telugu,Tamil),Turkish
	Tagalog
	Other Pacific Island languages	Chamorro,Hawaiian,Ilocano,Indonesian,Samoan
All other languages	Navajo
	Other Native North Americanlanguages	Apache,Cherokee,Choctaw,Dakota,Keres,Pima,Yupik
	Hungarian
	Arabic
	Hebrew
	African languages	Amharic,Ibo,Twi,Yoruba,Bantu,Swahili,Somali
	Other and unspecifiedlanguages	Syriac,Finnish,Other languagesof the Americas,not reported

Household language

In households where one or more people (5 years old and over) speak a language other than English, the household language assigned to all household members is the non-English language spoken by the first person with a non-English language in the following order: householder, spouse, parent, sibling, child, grandchild, in-laws, other relatives, stepchild, unmarried partner, housemate or roommate, and other nonrelatives. Thus, a person who speaks only English may have a non-English household language assigned to him/her in tabulations of individuals by household language.

Language density

Language density is a household measure of the number of household members who speak a language other than English at home in three categories: none, some, and all speak another language.

Limitation of the data

Some people who speak a language other than English at home may have first learned that language at school. However, these people would be expected to indicate that they spoke English "Very well." People who speak a language other than English, but do not do so at home, should have been reported as not speaking a language other than English at home.

The extreme detail in which language names were coded may give a false impression of the linguistic precision of these data. The names used by speakers of a language to identify it may reflect ethnic, geographic, or political affiliations and do not necessarily respect linguistic distinctions. The categories shown in the tabulations were chosen on a number of criteria, such as information about the number of speakers of each language that might be expected in a sample of the U.S. population.

Comparability

Information on language has been collected in every census since 1890, except 1950. The comparability of data among censuses is limited by changes in question wording, by the subpopulations to whom the question was addressed, and by the detail that was published. The same question on language was asked in 1980, 1990, and Census 2000. This question on the current language spoken at home replaced the questions asked in prior censuses on mother tongue; that is, the language other than English spoken in the persons home when he or she was a child; ones first language; or the language spoken before immigrating to the United States. The censuses of 1910-1940, 1960, and 1970 included questions on mother tongue.

A change in coding procedures from 1980 to 1990 improved accuracy of coding and may have affected the number of people reported in some of the 380 plus categories. In 1980, coding clerks supplied numeric codes for the written entries on each questionnaire using a 2,000 name reference list. In 1990, written entries were keyed, then transcribed to a computer file and matched to a computer dictionary that began with the 2,000 name list. The name list was expanded as unmatched entries were referred to headquarters specialists for resolution. In Census 2000, the written entries were transcribed by "optical character recognition" (OCR), or manually keyed when the computer could not read the entry. Then all language entries were copied to a separate computer file and matched to a master code list. The code list is the master file developed from all language unique entries on the 1990 census, and included over 55,000 entries. The computerized matching ensured that identical alphabetic entries received the same code. Unmatched entries were referred to headquarters specialists for coding. In 2000, entries were reported in about 350 of the 380 categories.

Reports

Reports & Data Download