Chapter 7. Operational Overview and Accuracy of the Data
7.1. Master Address File Development
As in Census 2000, the base for the address list for the 2010 Census was the address list from the previous census. Various updates were made to the address list during the intervening decade. The primary source of new addresses after Census 2000 was the Delivery Sequence File (DSF) from the U.S. Postal Service (USPS). The U.S. Census Bureau acquired this file of all mailing addresses in the United States and updated the Master Address File (MAF) twice a year (March and October) until February 2010. Addresses must be associated with a block to be included in the census. The process of associating these addresses with a block is called geocoding. For the Census Bureau, the file where geographic information is contained is the Topologically Integrated Geographic Encoding and Referencing System (TIGER®). During this decade, the MAF and TIGER® have been integrated into the MAF/TIGER® database (MTdb).
For the processing of the DSF records, city-style addresses (containing a house number and street name) of residential units were applied to the MAF in those blocks that, in general terms, had been found to have a majority of city-style addresses in order that duplication would not result from the addition of these addresses. Another file from the USPS, the Locatable Address Conversion System (LACS), contained linkages for addresses that had been changed. Use of this file by the Census Bureau allowed for old and new addresses to be linked in the MTdb. This was especially desirable for linking the non-city-style addresses that had been converted to city-style addresses.
Various field operations, such as census tests and the update of rural area addresses for the American Community Survey, led to localized updates or updates in specific types of areas. In particular, updating addresses with post office box type addresses in the rural area was meant to balance the updating of the MTdb with city-style addresses from the DSF.
The first large-scale update of addresses for the decennial census was the Local Update of Census Addresses (LUCA) program. In this program, governmental units (GUs) were allowed to participate in updating the address list in three different ways. One option allowed for review of the Census Bureau's address list, while the other options allowed only for the GUs to submit a list of addresses to the Census Bureau. Under any of these options, new addresses submitted by the GUs were included on the subsequent address list. These addresses could be for housing units (HUs) or for group quarters (GQs). The submitted addresses were then included in the universe for validation in the next major address list development operation, Address Canvassing, which occurred between April and July 2009.
Address Canvassing was conducted in all areas of the United States and Puerto Rico except in the areas that were designated for Remote Alaska or Remote Update Enumerate in the census. Address Canvassing was a dependent check of the list of addresses, as well as of the maps. The Address Canvassing operation was performed using automation, which allowed for the integration of address and map updates, as well as the imposition of rules on what constitutes a minimum allowable address and the collection of particular geographic fields. The allowable actions on the addresses in Address Canvassing were validate, nonresidential, delete, duplicate, address correction, and an action for designating possible GQs. Adds to the list were also allowed. All deletes and duplicates were validated during the next phase of the operation, called delete verification. The results from Address Canvassing were incorporated into the MTdb. One of the first uses of these results was for creating feedback to the GUs participating in LUCA. The results from processing the Address Canvassing updates were also used for the creation of the initial census address list starting in July 2009, the initial Universe Control and Management (UC&M) file. Printing of the questionnaires used this address list.
The results of Address Canvassing also contributed to the universe for the next operation, which was Group Quarters Validation (GQV). The procedures for creating the address list of GQs were significantly different for the 2010 Census as opposed to previous censuses. In order to reduce duplication and geographic data errors, the address lists of HUs and GQs were integrated in the MTdb. The list of potential GQs going into GQV was the accumulation of GQ addresses from Census 2000, GQ addresses acquired from various sources, and addresses listed as Other Living Quarters (potential GQs) in Address Canvassing. Cases designated as GQs through other sources were intended to be sent to GQV regardless of the Address Canvassing status. However, it was discovered during processing of the LUCA updates that GUs often listed apartment buildings or commercial units as GQs. The schedule for completing GQV, in October 2009, was one of the riskiest in the census. For these units to remain on the list of units to be checked in GQV put timely completion of the operation at risk, and thus, the entire census schedule. Therefore, from the LUCA updates making their way to GQV, only the units with facility names that included special key names known to be associated with group quarters were designated for follow-up in GQV unless they were also designated as possible GQs in Address Canvassing. The addresses in GQV could receive an action of GQ- along with the type of GQ, HU, nonresidential, vacant, transient (meaning the location was connected to a geographic area that should be enumerated in the Enumeration at Transitory Locations operation), delete, or duplicate.
The results from GQV were processed in November 2009. Updates were made to the initial UC&M, resulting in the enumeration UC&M, or the full census universe. Units that were marked as housing units in GQV were designated as adds to the initial UC&M universe. Other adds to this universe resulted from DSF updates that had occurred between the creation of the Address List for Address Canvassing and the creation of the enumeration UC&M. There were three DSFs contributing new adds to this UC&M file. A supplemental printing of addressed questionnaires resulted from this updated file. These questionnaires were either added to the mail stream at the point that questionnaires were delivered by the USPS or sent to the Local Census Offices in those areas where the added addresses occurred in Update/Leave areas.
Update/Leave (U/L) is an operation in which questionnaires are hand-delivered due to potential problems with postal delivery of addresses. The presence of staff in the field for this delivery allows the simultaneous updating of the address lists and maps. Addresses on the address list in U/L areas received the actions of verify, correction, nonresidential, delete, or duplicate. Maps could also receive updates. The operation occurred between March 1 and April 2, 2010. There were approximately 12 million housing units in stateside U/L areas, and Puerto Rico (about 1.6 million addresses) was entirely U/L. There was no check on the deletes and duplicates designated in this operation because the operation was performed on paper and there was a timing issue with processing. The status of nonresponding units that were in the enumeration universe was checked in the later Nonresponse Follow-Up operation.
Nonresponse Follow-Up (NRFU) is the operation in which nonresponding households from both mailout/ mailback and U/L areas are followed up and enumerated, if possible. Other options within NRFU are to mark the unit vacant (either a regular vacant or seasonal), delete, or duplicate. It is also possible to add units and perform the enumeration on them. The maps also may be updated. The status of regular vacants and deletes is checked in the subsequent operation, the Vacant/Delete Check.
No additional updates were made to the enumeration universe in the UC&M before the start of NRFU. However, there were still operations and processes adding addresses to the MTdb, or in some cases, adding geographic data that allowed the addresses to be included in the census. These were: 1) LUCA Appeals; 2) New Construction; 3) HU Address Review; 4) Count Review; 5) Spring 2010 DSF; 6) newly geocoded addresses; 7) addresses resulting from a follow-up of INFO-COMMs (standardized forms used to document problems, issues, and unusual situations or to ask questions about procedures and other work- related matters by field staff) submitted during Address Canvassing; and 8) U/L added addresses.
LUCA Appeals was a process where a GU that participated in LUCA submitted challenges to the outputs from Address Canvassing. The challenged addresses were reviewed, and those that were approved were accepted back onto the census list.
New Construction was an effort similar to LUCA in which the participating GU submitted addresses that represented newly constructed and livable housing.
The HU Address Review was a headquarters review of addresses coming from a variety of sources. In general, these were people who reported situations, such as large apartment buildings missing from the census universe, where the report made it to headquarters personnel. Staff in the Geography Division researched these address submissions to determine if they were truly missing from the census address list and, if so, why. When it was found that addresses should be included on the census address list and did not duplicate other addresses already on the census address list, they were submitted for processing in the same format as files from the New Construction and Count Review programs.
Count Review was another effort undertaken with governmental representatives with an eye toward identifying housing that was missing from the list.
The Spring 2010 DSF contained mailable addresses as of February 2010. Residential addresses appearing on this list that were not already included in the MTdb were assumed to represent mostly newly constructed units. For the next source of addresses on this update, there were a few million addresses residing on the MTdb that had not yet been geocoded. It was assumed that these addresses would be added in Address Canvassing if they truly existed.
However, when Address Canvassing was completed and only about half of the addresses had been added in the operation, and at the same time there were some concerns about the coverage of the list, the Census Bureau undertook an independent effort in early 2010 to find geocodes for these units after first checking that the addresses were not represented on the list in another form.
The next addition of addresses resulted from Address Canvassing INFO-COMMs. The design of the automated instrument did not allow for units to be added during the Quality Control (QC) phase of the operation if the assignment had passed QC. Nevertheless, in some areas QC staff found large numbers of missing addresses that were not picked up within the QC sample. They filled out an INFO-COMM to apprise the local office staff of the situation. In early 2010, there was an effort to identify which of these cases were really missing units and where they should be added to the list. An input file that mimicked the inputs of the other operations adding addresses was created for these units to be added to the census list in this process, as well.
The final source of new addresses listed here was U/L adds. These are units that did not appear on the list used for U/L but that were identified as valid in the field. Questionnaires were delivered to these units, and updates were made to the MTdb based on the results of this operation. However, the processing of U/L actions could not be completed in time to update the NRFU universe without automation. Thus, in order to perform enumeration on U/L adds for households that did not return the form, the units needed to be included in the subsequent updated universe.
These addresses just described made it into the enumeration universe for the first time for the Vacant/ Delete Check (VDC). Not all of these addresses required a visit for enumeration during the VDC. In particular, if a householder at an address added during U/L mailed in the form in time, no additional visit was required. In addition, an operation dubbed the Late Add Mailing resulted in many of these addresses being mailed forms earlier than enumeration would have occurred. In particular, the LUCA Appeals addresses, the addresses from the most recent DSF that were geocoded to a block, and the addresses that were newly geocoded during the geocoding research were placed on a file for which questionnaires were printed and mailed. A unique processing ID was printed on each of these questionnaires, which enabled the questionnaires to be linked up to the census ID that was used when the units were added to the VDC universe. This also allowed for the information about receipt of a questionnaire to be passed back to the universe file and removed from the universe of follow-up cases. Therefore, the final list of units requiring follow-up and potential enumeration in VDC were the regular vacants and deletes as designated in NRFU and the new units added from the seven sources listed above for which no questionnaire was received by the time of universe creation. The universe of addresses printed within the address registers were all units that appeared in the enumeration universe plus the units added from the seven sources listed. One category of new address that did not appear on the VDC listing pages, due to timing, were the units that were added during NRFU.
There was one final check of particular addresses in the field. This operation is called Field Verification, and it was performed in the 2010 Census much as it was in Census 2000. Only specific addresses within the entire universe of addresses were acted on during this status check. Units designated for follow-up in this operation received a status of valid (or verify), delete, or duplicate. The addresses in the universe for this check in the 2010 Census included two categories of cases. The first was a check of new addresses that resulted from Be Counted forms or calls to the Telephone Questionnaire Assistance line that did not have a related census ID. These addresses needed to be verified in the field before they were added to the census. They must be associated with a particular block before they could be sent for field work. The second category of cases was units that potentially needed to be removed from the universe based on the identification of duplicated persons in those units. Person duplication occurs for many reasons, one of which is duplication of housing units on the address list. Units linked by person matching that are within a close geographic area have been found in testing to be highly associated with housing-level problems. However, there are other situations that can lead to such person duplication that are not housing unit duplication, so these units identified as potential duplicates needed to be field checked before they could be safely removed from the census address list. For those units that were designated as duplicates, an indicator of which unit on the list that unit duplicated was collected.
The descriptions above cover the vast majority of housing units in the United States and in all of Puerto Rico. However, there are some particularly remote or problematic areas that were designated for other types of enumeration. The first of these to start was Remote Alaska. In this operation, enumeration was scheduled to occur in fishing and hunting villages before the ice broke and the villagers scattered from their winter homes. Other areas in Alaska that were remote but where the population is stationary were designated for Remote Update/Enumerate. The methodology is the same for these two operations. The incoming address list was based on what was there during Census 2000. Updates were made to the address list and maps, and the households were enumerated at the same time. An area of Maine was also designated for Remote Update/Enumerate.
Some areas of the country (covering about 1.5 million addresses) were designated for Update/Enumerate. In these areas, Address Canvassing was completed, but it was felt that enumeration by Mailout/Mailback or U/L would have been problematic. These areas could be seasonal housing, federally designated tribal areas, or areas with particularly low predicted response rates based on various demographic factors. In Update/Enumerate, as in Remote Update/Enumerate and Remote Alaska, updates were made to the address list and maps at the same time that enumeration was completed. In general, this is the last operation that occurs in these areas, although it is possible for units in the Field Verification universe to be in these areas.
A list of nonstandard housing, such as college dormitories and group homes, was tracked in the MTdb in conjunction with the housing unit list. The list of these GQs was compiled from various sources, including the Census 2000 list of GQs, LUCA participants, the Federal-State Cooperative for Population Estimates, and Address Canvassing. The GQV of these addresses occurred in October 2009, as described above. The units that remained GQs after this check were included in Group Quarters Advance Visit (GQAV), and then Group Quarters Enumeration (GQE). In GQE, individual census questionnaires-meaning individual questionnaires for each person-were distributed at the GQs and collected by the field staff. A count of persons associated with a particular GQ resulted from this operation.
7.3. Service-Based Enumeration and Enumeration at Transitory Locations
Service-Based Enumeration (SBE) was designed to account for the enumeration of persons without a usual residence that use service facilities (i.e., shelters, soup kitchens, and mobile food vans). In the 2010 Census, 3 days (March 29-31) were designated for these enumeration activities. Different types of facilities were designated for different days. Only persons using the service facility on the interview day were enumerated at that location. It was possible for people to be counted in more than one location due to use of different facilities on subsequent days. There is an unduplication of SBE persons for the purpose of minimizing this duplication.
People experiencing homelessness could also complete a Be Counted form and check the box indicating this status. To the extent that such persons could be associated with a state and county based on the information provided on the form, they will be counted at a Group Quarters within that state and county.
Certain areas were designated for Enumeration at Transitory Locations (ETL). These included RV parks and marinas where people were living as of Census Day if people living in these locations had no other permanent place to stay. The locations where ETL took place were designated in GQAV. When people were enumerated in ETL, the particular location was considered a housing unit.
The Census Bureau has modified some data in this data release to protect confidentiality. Title 13 U.S. Code, Section 9, prohibits the Census Bureau from publishing results in which an individual's data can be identified.
The Census Bureau's internal Disclosure Review Board monitors the disclosure review process and sets the confidentiality rules for all data releases. A checklist approach is used to ensure that all potential risks are considered and addressed. A list of possible concerns is created and the Disclosure Review Board makes sure that the appropriate steps are taken to assure the confidentiality of the data.
Title 13 of the U.S. Code authorizes the Census Bureau to conduct surveys and censuses and mandates that any information obtained from private individuals and establishments remains confidential. Section 9 of Title 13 prohibits the Census Bureau from releasing "any publication whereby the data furnished by any particular establishment or individual under this title can be identified." Section 214 of Title 13, as modified by the Federal Sentencing Reform Act, imposes a fine of not more than $250,000 and/or imprisonment of not more than 5 years for publication or communication in violation of Section 9.
Disclosure avoidance is the process of disguising data to protect confidentiality. A disclosure of data occurs when someone can use published statistical information to identify an individual who provided information under a pledge of confidentiality. Using disclosure avoidance, the Census Bureau modifies or removes all of the characteristics that put confidential information at risk for disclosure. Although it may appear that a table shows information about a specific individual, the Census Bureau has taken steps (such as data swapping) to disguise the original data while making sure the results are useful.
Data swapping is a method of disclosure avoidance designed to protect confidentiality in tables of frequency data (the number or percentage of the population with certain characteristics). Data swapping is done by editing the source data or exchanging records for a sample of cases. A sample of households is selected and matched on a set of selected key variables with households in neighboring geographic areas (geographic areas with a small population) that have similar characteristics (same number of adults, same number of children, etc.). Because the swap often occurs within a geographic area with a small population, there is no effect on the marginal totals for the geographic area with a small population or for totals that include data from multiple geographic areas with small populations. Because of data swapping, users should not assume that tables with cells having a value of one or two reveal information about specific individuals.
In any large-scale statistical operation, such as the 2010 Census, human- and computer-related errors occur. These errors are commonly referred to as nonsampling errors. Such errors include not enumerating every household or every person in the population, not obtaining all required information from the respondents, obtaining incorrect or inconsistent information, and recording information incorrectly. In addition, errors can occur during the field review of the enumerators' work, during clerical handling of the census questionnaires, or during the electronic processing of the questionnaires.
While it is impossible to completely eliminate nonsampling error from an operation as large and complex as the decennial census, the Census Bureau attempts to control the sources of such error during the collection and processing operations. Described below are the primary sources of nonsampling error and the programs instituted to control this error in the 2010 Census. The success of these programs, however, was contingent upon how well the instructions actually were carried out during the census.
Nonresponse to particular questions on the census questionnaire or the failure to obtain any information for a housing unit allows for the introduction of bias into the data because the characteristics of the nonrespondents have not been observed and may differ from those reported by respondents. As a result, any imputation procedure using respondent data may not completely reflect these differences either at the elemental level (individual person or housing unit) or on the average. Some protection against the introduction of large biases is afforded by minimizing nonresponse. Characteristics for the nonresponses were imputed by using reported data for a person or housing unit with similar characteristics.
The person answering the mail questionnaire for a household or responding to the questions posed by an enumerator could serve as a source of error, although the question wording was extensively tested in several experimental studies prior to the census. The mail respondent may overlook or misunderstand a question or answer a question in a way that cannot be interpreted correctly by the data capture system. The enumerator may also misinterpret or otherwise incorrectly record information given by a respondent or may fail to collect some of the information for a person or household. To control problems such as these with the field enumeration, the work of enumerators was monitored carefully. Field staff were prepared for their tasks by using standardized training packages that included hands-on experience in using census materials. A sample of the households interviewed by each enumerator were reinterviewed to control for the possibility of fabricated data being submitted by enumerators.
The many phases involved in processing the census data represent potential sources for the introduction of nonsampling error. The processing of the census questionnaires completed by enumerators included field review by the crew leader, check-in, and transmittal of completed questionnaires. No field reviews were done on the mail return questionnaires for this census.
Error may also be introduced by the misinterpretation of data by the data capture system or the failure to capture all the information that the respondents or enumerators provided on the forms. Write-in entries go through coding operations, which may also be a source of processing error in the data. Many of the various field, coding, and computer operations undergo a number of quality control checks to help ensure their accurate application.
To reduce various types of nonsampling errors, a number of techniques were implemented during the planning, development of the mailing address list, data collection, and data processing activities. Quality assurance methods were used throughout the data collection and processing phases of the census to improve the quality of the data. A reinterview program was implemented to minimize the errors in the data collection phase for enumerator-filled questionnaires.
Several coverage improvement programs were implemented during the development of the census address list and census enumeration and processing to minimize undercoverage of the population and housing units. These programs were developed based on experience from previous decennial censuses and results from the 2010 Census testing cycle.
Be Counted questionnaires, unaddressed forms requesting all questionnaire items plus a few additional items, were available in public locations for people who believed they were not otherwise counted.
An introductory letter was sent to all mailout/mailback addresses and many addresses in update/leave areas prior to the mailing of the census form. A reminder postcard was also sent to these addresses.
A replacement questionnaire was sent to nonresponding addresses in selected areas.
Bilingual English/Spanish questionnaires were sent to all addresses in selected areas.
Forms in Spanish, Chinese (simplified), Korean, Russian, and Vietnamese were mailed to those who requested them and Language Assistance Guides were available in 59 languages.
A well-publicized toll-free phone number was available to answer questions about the forms, and responses could be taken over the phone.
Under the LUCA program, local officials had the opportunity to address specific concerns about the accuracy and completeness of the address list.
A Coverage Followup (CFU) telephone interview operation was implemented with the express purpose of improving within household coverage. Cases were telephoned when there was a discrepancy between the number in the count of persons box and the number of persons with data. A household- level undercoverage question was added to the questionnaire, and person-level overcoverage questions were also added. Certain categories of households checking these boxes were also selected for CFU for roster clarification. In addition, large households, or those with more than six household members, were selected for inclusion in CFU for the purpose of collecting full demographic data for persons beyond the first six.
With multiple ways for people to initiate or complete their enumeration, as well as the field follow-up operations, it was very likely that some households would be enumerated more than once. A special computer process was implemented to control the extent of this type of nonsampling error by resolving situations where more than one form was received from an address. The process consisted of several steps. Addresses that had more than one viable return were analyzed. Household data from one form were chosen as the household data to use in subsequent census processing. There are situations in which persons can then be added to the household roster if they are not already represented there. These are the cases in which a Be Counted form for a partial household was submitted for the same address, and when an enumeration operation discovers a person who should be counted at a different address (a Usual Home Elsewhere) from the address being enumerated.
The objective of the processing operation was to produce a set of data that describes the population as accurately and clearly as possible. As with Census 2000, information on 2010 Census questionnaires generally was not edited for consistency, completeness, and acceptability during field data collection nor during data capture operations. Enumerator-filled questionnaires were reviewed by census crew leaders and local office clerks for adherence to specified procedures. No clerical review of mail return questionnaires was done to ensure that the information on the form could be data captured, nor were households contacted to collect data that were missing from census returns as in previous censuses.
Most census questionnaires received by mail from respondents as well as those filled by enumerators were processed through a new contractor-built image scanning system that used optical mark and character recognition to convert the responses into computer files. The optical character recognition, or OCR, process used several pattern and context checks to estimate accuracy thresholds for each write-in field. The system also used edits on interpreted write-in responses to decide whether the field values read by the machine interpretation were acceptable. If the value read had a lower than acceptable accuracy threshold or was outside of the edit range, the image of the item was displayed to a keyer, who then entered the response.
To control the creation of possibly erroneous persons from questionnaires completed incorrectly or containing stray marks, an edit on the number of persons indicated on each mail return and enumerator- filled questionnaire was implemented as part of the data capture system. In addition, a new edit identified questionnaires with information written outside of the response boxes. Detection of either of these conditions by the edits subsystem resulted in the review of the questionnaire image at a workstation by an operator who ensured that the person data were captured fully and correctly.
At Census Bureau headquarters, the data records were subjected to a computer edit that identified households exhibiting a possible coverage problem and those with more than six household members. Attempts were made to contact these households on the telephone to correct the count inconsistency and to collect the census data for those people for whom there was no room on the questionnaire.
Once census processing is completed, each address included in the census data collection has to be classified as a nonexistent unit, a vacant unit, or an occupied housing unit. Records that are classified as an occupied unit also need a reported number of residents. This information is necessary to have a complete count of the population and housing units in the United States as of Census Day. Because of the complexity of census operations, there are records that do not have such information by the end of the follow-up activities and data processing. To fill in this missing information, the Census Bureau conducted count imputation, which assigns a unit status and household size to records without such information. This process also included assigning household size to occupied units without household size information. Count imputation processing did not include group quarters.
In count imputation, all the records in the enumeration universe were partitioned within a designated geographical area into small groups based on certain characteristics. For each small group, a probability distribution of unit status and size from the records that had this information was created. Then, the distribution was used to impute for the missing data status and/or household size.
The final automated edit and imputation processes determined the final values of questionnaire data items for records with missing or invalid values in collected data. Imputations, which were needed most often when an entry for a given item was missing, included three general procedures known as assignments, allocations, and substitution. Assignments and allocations were imputations of characteristic items on an item-by-item basis, whereas the substitution process imputed data for up to six persons in a household at one time. Each of these procedures ensured the completeness and consistency of the data by providing acceptable codes for missing or unacceptable entries.
The first step in the edit process was to assign acceptable codes in place of unacceptable entries or blanks when acceptable data were found for that same person. When one characteristic item reported for a person was inconsistent with other information provided for that same person, acceptable codes or values that were consistent with one item of reported information were assigned. The edit procedures also assigned race or Hispanic origin from a matched person record in Census 2000 or in the American Community Survey (2000-2009) when these fields were missing. These assignment steps strove to ensure consistency across characteristic data.
The next step in the edit process, known as allocation, was to impute responses for missing person or housing-unit characteristic data. The general procedure for changing unacceptable entries through allocation was to derive an entry for a person (or housing unit) that was consistent with entries for another person (or housing unit) with similar characteristics. Allocation rates for census items were made available with the published census data.
Another way corrections are made during the edit and imputation process was through substitution; that is, the replication of a full set of characteristics for people in a household. When there was an indication that a household was occupied by a specified number of people but the questionnaire record contained no information for the people within the household or the occupants were not listed on the questionnaire, a previously accepted household of the same size was selected as a substitute. The full set of characteristics of the substitute was duplicated. Counts of substituted persons and the occupied housing units containing substituted persons were made available with the published census data.