Documentation: | ACS 2007 (3-Year Estimates) |
you are here:
choose a survey
survey
document
chapter
Publisher: U.S. Census Bureau
Survey: ACS 2007 (3-Year Estimates)
Document: | ACS 2007-3yr Summary File: Technical Documentation |
citation: | Social Explorer; U.S. Census Bureau; 2005-2007 American Community Survey 3-Year Summary File: Technical Documentation. |
Chapter Contents
"How to Use the ACS Summary File" is intended to be used as a guide for data users to the American Community Survey (ACS) summary file and the documentation. New users should review this chapter before using the ACS Summary File.
All ACS Summary File (ACS-SF) are available in the .zip format, which is a compressed format to reduce the file size. Users can download the .zip files through the File Transfer Protocol (FTP) from the American Community Survey FTP site located at URL: http://www2.census.gov/acs2007_3yr/summaryfile/
The ACS-SF will need to be unzipped before they can be read.
The FTP site has 53 directories for the data, which include the United States, the 50 states, District of Columbia, Puerto Rico, and example SAS programs. In each directory, there is a geography file for the selected area, and the .zip files are then organized by the geographic areas. For the users convenience, if users want the .zip files for all the geographies under a particular directory, the file all_< state|dc|pr|us >.zip contains all the files in the directory. This allows the users to save time and not have to download each individual file.
After unzipping the ACS-SF, three ASCII (plain text) files are available, one file for the estimates, one for the margins of error, and one for the standard errors. Each ACS-SF includes detailed tables for all of the geographic areas published in 2007 and a geographic header record file in a fixed field; the geographic header record file's data portion includes a geographic link in the comma-delimited format. For more information on the files and their purposes refer to the README file in the main directory on the FTP site on http://www2.census.gov/acs2007_3yr/summaryfile/readme2007.pdf
The ACS-SF will need to be unzipped before they can be read.
The FTP site has 53 directories for the data, which include the United States, the 50 states, District of Columbia, Puerto Rico, and example SAS programs. In each directory, there is a geography file for the selected area, and the .zip files are then organized by the geographic areas. For the users convenience, if users want the .zip files for all the geographies under a particular directory, the file all_< state|dc|pr|us >.zip contains all the files in the directory. This allows the users to save time and not have to download each individual file.
After unzipping the ACS-SF, three ASCII (plain text) files are available, one file for the estimates, one for the margins of error, and one for the standard errors. Each ACS-SF includes detailed tables for all of the geographic areas published in 2007 and a geographic header record file in a fixed field; the geographic header record file's data portion includes a geographic link in the comma-delimited format. For more information on the files and their purposes refer to the README file in the main directory on the FTP site on http://www2.census.gov/acs2007_3yr/summaryfile/readme2007.pdf
The .zip file names for all of the files follow a predefined structure. The naming convention will enable a user and/or a computer program to determine the contents of a file from its name. Figures 2.1 and 2.2 displays an overview of the file naming conventions.
ACS SF file naming convention for the data summary file names will be tyyyypggssssiterr.zip :
- t= type of data (e=estimate, m=margin of error, s=standard error)
- yyyy= reference year for the data, e.g., 2005
- p= period covered by the file (1=1-year, 3=3-year, 5=5-year)
- gg= geographic area (state or US) covered by the file; 'PR' for the Puerto Rico
Community Survey
- ssss= file sequence number (valid range is 0001 through 9999)
This sequence number will be used in a manner similar to the approach taken for the SF3. Each sequence number will correspond to a series of detailed tables. Each record of this file will be for a unique geographic area published by the ACS. All the data cells contained in the detailed tables for that geographic area would be on this record. There will be a file for the estimates and a separate one for the standard errors.
- iter= '000', placeholder for future values for iteration
- r= data revision indicator, only present if needed. 'a', 'b', etc
The ACS-SF file naming convention for the geographic header files will be tyyyypgg.txt:
- t= type of data (g= geographic header file)
-yyyy= reference year for the data, e.g., 2005
-p= period covered by the file (1=1-year, 3=3-year, 5=5-year)
- gg= geographic area (state or US) covered by the file; ' PR for the Puerto Rico Community Survey
Figure 2.1 Zip File Naming Convention Example
Figure 2.2 Geoheader Summary File Naming Convention Example
ACS SF file naming convention for the data summary file names will be tyyyypggssssiterr.zip :
- t= type of data (e=estimate, m=margin of error, s=standard error)
- yyyy= reference year for the data, e.g., 2005
- p= period covered by the file (1=1-year, 3=3-year, 5=5-year)
- gg= geographic area (state or US) covered by the file; 'PR' for the Puerto Rico
Community Survey
- ssss= file sequence number (valid range is 0001 through 9999)
This sequence number will be used in a manner similar to the approach taken for the SF3. Each sequence number will correspond to a series of detailed tables. Each record of this file will be for a unique geographic area published by the ACS. All the data cells contained in the detailed tables for that geographic area would be on this record. There will be a file for the estimates and a separate one for the standard errors.
- iter= '000', placeholder for future values for iteration
- r= data revision indicator, only present if needed. 'a', 'b', etc
The ACS-SF file naming convention for the geographic header files will be tyyyypgg.txt:
- t= type of data (g= geographic header file)
-yyyy= reference year for the data, e.g., 2005
-p= period covered by the file (1=1-year, 3=3-year, 5=5-year)
- gg= geographic area (state or US) covered by the file; ' PR for the Puerto Rico Community Survey
Figure 2.1 Zip File Naming Convention Example
Figure 2.2 Geoheader Summary File Naming Convention Example
The geographic header record layout is below (Table 2.1). The information in each summary level column is a guide to the presence or absence of additional geographic information on that specific summary level. In each row of a particular geographic area code, there is a listing of the summary levels that contain the area code. For example, to uniquely identify a county (summary level 050) you need to know the state and county code. There are 9 Brown counties in the U.S. and in order to identify it uniquely you need to know the state.
Note: The Geographic Summary Levels that are designated, as "Reserved for future use" will be used for the ACS period estimate data products. These reference names will not be defined in the ACS Geographic Terms and Concepts section until they are used.
Table 2.1 Geographic Header Record Layout for the ACS Summary File | ||||
---|---|---|---|---|
Data Dictionary Reference Name | Description | Field Size | Starting Position | Geographic Summary Levels For Single-Year Tables |
RECORD CODES | ||||
FILEID | Always equal to ACS Summary File identification | 6 | 1 | All Summary Levels |
STUSAB | State Postal Abbreviation | 2 | 7 | All Summary Levels |
SUMLEVEL | Summary Level | 3 | 9 | All Summary Levels |
COMPONENT | Geographic Component | 2 | 12 | All Summary Levels |
LOGRECNO | Logical Record Number | 7 | 14 | All Summary Levels |
GEOGRAPHIC AREA CODES | ||||
US | US | 1 | 21 | 010 |
REGION | Census Region | 1 | 22 | 020 |
DIVISION | Census Division | 1 | 23 | 030 |
STATECE | State (Census Code) | 2 | 24 | 040, 050, 060, 160, 230, 500, 795, 950, 960, 970 |
STATE | State (FIPS Code) | 2 | 26 | 040, 050, 060, 160, 230, 500, 795, 950, 960, 970 |
COUNTY | County of current residence | 3 | 28 | 050, 060 |
COUSUB | County Subdivision (FIPS) | 5 | 31 | 060 |
PLACE | Place (FIPS Code) | 5 | 36 | 160, 312, 352 |
TRACT | Census Tract | 6 | 41 | Reserved for future use |
BLKGRP | Block Group | 1 | 47 | Reserved for future use |
CONCIT | Consolidated City | 5 | 48 | Reserved for future use |
AIANHH | American Indian Area/Alaska Native Area/ Hawaiian Home Land (Census) | 4 | 53 | 250 |
AIANHHFP | American Indian Area/Alaska Native Area/ Hawaiian Home Land (FIPS) | 5 | 57 | 250 |
AIHHTLI | American Indian Trust Land/ Hawaiian Home Land Indicator | 1 | 62 | Reserved for future use |
AITSCE | American Indian Tribal Subdivision (Census) | 3 | 63 | Reserved for future use |
AITS | American Indian Tribal Subdivision (FIPS) | 5 | 66 | Reserved for future use |
ANRC | Alaska Native Regional Corporation (FIPS) | 5 | 71 | 230 |
CBSA | Metropolitan and Micropolitan Statistical Area | 5 | 76 | 310, 312, 314, 332, P10, P11, P12 |
CSA | Combined Statistical Area | 3 | 81 | 330, P09 |
METDIV | Metropolitan Division | 5 | 84 | 314 |
MACC | Metropolitan Area Central City | 1 | 89 | Reserved for future use |
MEMI | Metropolitan/Micropolitan Indicator Flag | 1 | 90 | 010, 020, 030, 040 |
NECTA | New England City and Town Area | 5 | 91 | 350, 352, 355, P14, P15, P16 |
CNECTA | New England City and Town Combined Statistical Area | 3 | 96 | 335 |
NECTADIV | New England City and Town Area Division | 5 | 99 | 355 |
UA | Urban Area | 5 | 104 | 400 |
UACP | Urban Area Central Place | 5 | 109 | Reserved for future use |
CDCURR | Current Congressional District *** | 2 | 114 | 500 |
SLDU | State Legislative District Upper | 3 | 116 | Reserved for future use |
SLDL | State Legislative District Lower | 3 | 119 | Reserved for future use |
VTD | Voting District | 6 | 122 | Reserved for future use |
ZCTA3 | ZIP Code Tabulation Area (3-digit) | 3 | 128 | Reserved for future use |
ZCTA5 | ZIP Code Tabulation Area (5-digit) | 5 | 131 | Reserved for future use |
SUBMCD | Subbarrio (FIPS) | 2 | 136 | Reserved for future use |
SDELM | School District (Elementary) | 5 | 138 | 950 |
SDSEC | School District (Secondary) | 5 | 143 | 960 |
SDUNI | School District (Unified) | 5 | 148 | 970 |
UR | Urban/Rural | 1 | 153 | 010, 020, 030, 040 |
PCI | Principal City Indicator | 1 | 154 | 312, 352 |
TAZ | Traffic Analysis Zone | 6 | 155 | Reserved for future use |
UGA | Urban Growth Area | 5 | 161 | Reserved for future use |
PUMA5 | Public Use Microdata Area - 5% File | 5 | 166 | 795, 901 |
PUMA1 | Public Use Microdata Area - 1% File | 5 | 171 | Reserved for future use |
GEOID | Geographic Identifier | 40 | 176 | All Summary Levels |
NAME | Area Name | 200 | 216 | All Summary Levels |
RESERVED | For Future needs | 50 | 418 | Reserved for future use |
Note: The Geographic Summary Levels that are designated, as "Reserved for future use" will be used for the ACS period estimate data products. These reference names will not be defined in the ACS Geographic Terms and Concepts section until they are used.
Detailed information on ACS summary levels can be found in Chapter 4. Chapter 4 identifies each geographic level and provides the code for the level. Figure 2.3 provides an example of the various geographic hierarchies used in the American Community Survey. Take some time to review this chart to become familiar with the different hierarchies.
Figure 2.3 Hierarchy of ACS Geographic Entities
Figure 2.3 Hierarchy of ACS Geographic Entities
Let's say you want to create Table B08406, "Sex of workers by means of transportation to work for workplace geography," for the state of Alaska from the files on the ftp site
http://www2.census.gov/acs2007_3yr/summaryfile/
Which files do you need? How do you read the files?
You will need three files:
1. The data dictionary (merge_5_6_final.xls)
2. The data file (e20073ak0001000.txt)
3. The geography file (g20073ak.txt)
Start with the data dictionary, merge_5_6_final.xls. Under the "Tblid" column, look for the value "B08406". You will see that the "Sequence Number" is "0003." This means that the data you looking for are in the file "e20073ak0003000.txt". How do we know this is the right file? We know this from the name of the file: the "e" stands for estimate, 2007 is the year, "ak" is the state (Alaska), and "0003" is the sequence number (which contains the data for Table B08406). See the "File Naming Conventions" section in Chapter 2.
Then use the geography file for Alaska to determine the location within the state to which the data refer. The appropriate file is g20073ak.txt, where "g" means "geography", 2007 is the year, 3 is the period estimate (in this case, 3-year estimate), and "ak" is the state. (For each state, the geography file contains the lower-case postal-abbreviation of the state.) When you open the data file, e20073ak0001000.txt (the data file), you will see the following comma-delimited fields on the first line:
ACSSF,2007e3,ak,000,0003,0000001,335039,264535,219908,44627,35706, ...
The first six fields - from "ACSSF" to "0000001" - are identifiers.
A. The first field tells you that this is an ACS Summary File;
B. The second tells you that these data are three-year estimates for the year 2007 (notice the "e" before "2007" and the "3" at the end);
C. The third tells you the state ("ak" is Alaska);
D. The fourth is an iteration number;
E. The fifth is the all-important sequence number;
F. The last is a logical record code (LOGRECNO). The LOGRECNO identifies the location within a state.
The geography file, g20073ak.txt, describes the LOGRECNO. Each LOGRECNO specifies a location pertaining to the state. For example, a LOGRECNO of "0000001" means the state of Alaska; a LOGRECNO of "0000002" means just the urban areas in Alaska; a LOGRECNO of "0000003" refers to just rural areas in Alaska. Notice that each state has its own geography file.
The other fields in the data file, from the seventh on, are data values. Each field corresponds to the value of the "line number" variable in the data dictionary. So field number seven (the 335039 value, after the sixth comma) corresponds to line number one, which is "Total". Field number eight (the 264535 value, after the seventh comma) refers to line number two, which is "Car, Truck, or Van." Field number nine (the 219908 value) corresponds to line number three, which is "Drove alone." This continues all the way up to line number 51, at which point Table B08406 ends.
Were you to read this into a computer program using software such as SAS, you could translate the first ten fields of line number one in e20073ak0001000.txt as follows:
Figure 2.4 Linking Geographic Header File to the Data Files
http://www2.census.gov/acs2007_3yr/summaryfile/
Which files do you need? How do you read the files?
You will need three files:
1. The data dictionary (merge_5_6_final.xls)
2. The data file (e20073ak0001000.txt)
3. The geography file (g20073ak.txt)
Start with the data dictionary, merge_5_6_final.xls. Under the "Tblid" column, look for the value "B08406". You will see that the "Sequence Number" is "0003." This means that the data you looking for are in the file "e20073ak0003000.txt". How do we know this is the right file? We know this from the name of the file: the "e" stands for estimate, 2007 is the year, "ak" is the state (Alaska), and "0003" is the sequence number (which contains the data for Table B08406). See the "File Naming Conventions" section in Chapter 2.
Then use the geography file for Alaska to determine the location within the state to which the data refer. The appropriate file is g20073ak.txt, where "g" means "geography", 2007 is the year, 3 is the period estimate (in this case, 3-year estimate), and "ak" is the state. (For each state, the geography file contains the lower-case postal-abbreviation of the state.) When you open the data file, e20073ak0001000.txt (the data file), you will see the following comma-delimited fields on the first line:
ACSSF,2007e3,ak,000,0003,0000001,335039,264535,219908,44627,35706, ...
The first six fields - from "ACSSF" to "0000001" - are identifiers.
A. The first field tells you that this is an ACS Summary File;
B. The second tells you that these data are three-year estimates for the year 2007 (notice the "e" before "2007" and the "3" at the end);
C. The third tells you the state ("ak" is Alaska);
D. The fourth is an iteration number;
E. The fifth is the all-important sequence number;
F. The last is a logical record code (LOGRECNO). The LOGRECNO identifies the location within a state.
The geography file, g20073ak.txt, describes the LOGRECNO. Each LOGRECNO specifies a location pertaining to the state. For example, a LOGRECNO of "0000001" means the state of Alaska; a LOGRECNO of "0000002" means just the urban areas in Alaska; a LOGRECNO of "0000003" refers to just rural areas in Alaska. Notice that each state has its own geography file.
The other fields in the data file, from the seventh on, are data values. Each field corresponds to the value of the "line number" variable in the data dictionary. So field number seven (the 335039 value, after the sixth comma) corresponds to line number one, which is "Total". Field number eight (the 264535 value, after the seventh comma) refers to line number two, which is "Car, Truck, or Van." Field number nine (the 219908 value) corresponds to line number three, which is "Drove alone." This continues all the way up to line number 51, at which point Table B08406 ends.
Were you to read this into a computer program using software such as SAS, you could translate the first ten fields of line number one in e20073ak0001000.txt as follows:
File Identification | File Type | State/U.S.- Abbreviation (USPS) | Character Iteration | Sequence Number | Logical Record Number | Total: | Car, truck, or van: | Drove alone | Carpooled: |
---|---|---|---|---|---|---|---|---|---|
ACSSF | 2007e1 | ak | 000 | 0003 | 0000001 | 335039 | 264535 | 219908 | |
ACSSF | 2007e1 | ak | 000 | 0003 | 0000010 | 149452 | 134141 | 113696 | |
ACSSF | 2007e1 | ak | 000 | 0003 | 0000011 | 47314 | 39757 | 33343 | |
ACSSF | 2007e1 | ak | 000 | 0003 | 0000012 | . | . | . | |
ACSSF | 2007e1 | ak | 000 | 0003 | 0000013 | 149452 | 134141 | 113696 | |
ACSSF | 2007e1 | ak | 000 | 0003 | 0000017 | 49452 | 134141 | 113696 |
Figure 2.4 Linking Geographic Header File to the Data Files
Each ACS-SF consists of 460 physical files: one geographic header file and 459 data files, of which 153 contain estimates, 153 contain standard errors, and 153 contain margins of error. The larger size of the tables made it necessary to divide the files into smaller segments. Figure 2.5 displays the overview of the contents in an ACS Summary File. Additional information on file/table segmentation can be found in Chapter 5: List of Tables.
A logical unique record number is assigned to all files for a specific geographic entity. This is done so that all records for a specific entity can be linked together across files. Besides the logical record number, other identifying fields are also carried over from the geographic header file to the table files. These are file identification, state/U.S. abbreviation, and characteristic. For SAS users, a SAS program for importing the data files is provided in the main directory of the ACS-Summary File's FTP site on:
http://www2.census.gov/acs2007_3yr/summaryfile/
Figure 2.5 Contents in an ACS Summary File Overview
A logical unique record number is assigned to all files for a specific geographic entity. This is done so that all records for a specific entity can be linked together across files. Besides the logical record number, other identifying fields are also carried over from the geographic header file to the table files. These are file identification, state/U.S. abbreviation, and characteristic. For SAS users, a SAS program for importing the data files is provided in the main directory of the ACS-Summary File's FTP site on:
http://www2.census.gov/acs2007_3yr/summaryfile/
Figure 2.5 Contents in an ACS Summary File Overview