2018 ACS 1-year and 2014-2018 ACS 5-year Data Releases: Technical Documentation
Worked Examples for Approximating Margins of Error
This document provides how to approximate standard errors and margins of error when aggregating American Community Survey (ACS) estimates (either by combining geographies, characteristics, or both). Previously this information was provided in the ACS and PRCS
Accuracy of the Data documentation.
The ACS published 90% confidence level margins of error along with the estimates. The margin of error is derived from the variance. In most cases, the variance is calculated using a replicate-based methodology known as successive difference replication (SDR) that takes into account the sample design and estimation procedures.
The SDR formula is:
Here, X0 is the estimate calculated using the production weight and Xr is the estimate calculated using the rth replicate weight. The standard error is the square root of the variance. The 90th percent margin of error is 1.645 times the standard error.
1. The estimate of the number or proportion of people, households, families, or housing units in a geographic area with a specific characteristic is zero. A special procedure is used to estimate the standard error.
2. There are either no sample observations available to compute an estimate or standard error of a median, an aggregate, a proportion, or some other ratio, or there are too few sample observations to compute a stable estimate of the standard error. The estimate is
represented in the tables by "-" and the margin of error by "**" (two asterisks).
3. The estimate of a median falls in the lower open-ended interval or upper open-ended interval of a distribution. If the median occurs in the lowest interval, then a "-" follows the estimate, and if the median occurs in the upper interval, then a "+" follows the
estimate. In both cases, the margin of error is represented in the tables by "***" (three asterisks).
The following sections provide the equations that are used to approximate the margin of error when aggregating ACS estimates across geographies or characteristics. Note that these methods are approximations and do not include the covariance.
The ACS uses a 90 percent confidence level for the margin of error. Some older estimates may show the 90 percent confidence bounds, instead.
The margin of error is the maximum difference between the estimate and the upper and lower confidence bounds. Most tables on AFF containing 2005 or later ACS data display the MOE. Use the MOE to calculate the SE (dropping the "+/-" from the displayed value first) as:
Here, Z = 1.645 for ACS data published in 2006 to the present.
Users of 2005 and earlier ACS data should use Z= 1.65
If confidence bounds are provided instead (as with most ACS data products for 2004 and earlier), calculate the margin of error first before calculating the standard error:
Note that the formulas provided here use the standard error. However, mathematically, the MOE may be substituted in for the SE.
Example 1 - Calculating the Standard Error from the Margin of Error
The estimated number of males, never married is 47,194,876 as found on summary table B12001 (Sex by Marital Status for the Population 15 Years and Over) for the United States for 2017. The margin of error is 89,037. Recall that:
Calculating the standard error using the margin of error, we have:
Estimates of standard errors displayed in tables are for individual estimates. Additional calculations are required to estimate the standard errors for sums of or the differences between two or more sample estimates.
The standard error of the sum of two sample estimates is the square root of the sum of the two individual standard errors squared plus a covariance term. That is, for standard errors SE(X1) and SE(X2) of estimates X1 and X2:
The covariance measures the interactions between two estimates. Currently the covariance terms are not available. The other approximation formula provided in this document also do not take into account any covariance.
For calculating the SE for sums or differences of estimates, data users should therefore use the following approximation:
However, it should be noted that this approximation will underestimate or overestimate the standard error if the two estimates interact in either a positive or a negative way.
The approximation formula (2) can be expanded to more than two estimates by adding in the individual standard errors squared inside the radical. As the number of estimates involved in the sum or difference increases, the results of the approximation become increasingly different from the standard error derived directly from the ACS microdata. Care should be taken to work with the fewest number of estimates as possible. If there are estimates involved in the sum that are controlled, then the approximate standard error can be increasingly different.
Later in this document, examples are provided to demonstrate issues associated with approximating the standard errors when summing large numbers of estimates together.
Example 2 - Calculating the Standard Error of a Sum or a Difference
We are interested in the total number of people who have never been married. From Example 1, we know the number of males, never married is 47,194,876. From summary table B12001 we have the number of females, never married is 41,142,530 with a margin of error of 84,363. Therefore, the estimated number of people who have never been married is 47,194,876 + 41,142,530 = 88,337,406.
To calculate the approximate standard error of this sum, we need the standard errors of the two estimates in the sum. We calculated the standard error for the number of males never married in Example 1 as 54,126. The standard error for the number of females never married
is calculated using the margin of error:
Using formula (2) for the approximate standard error of a sum or difference we have:
Caution: This method will underestimate or overestimate the standard error if the two estimates interact in either a positive or a negative way.
To calculate the lower and upper bounds of the 90 percent confidence interval around 88,337,406 using the standard error, simply multiply 74,563 by 1.645, then add and subtract the product from 88,337,406. Thus the 90 percent confidence interval for this estimate is
[88,337,406 - 1.645(74,563)] to [88,337,406 + 1.645(74,563)] or 88,460,062 to 88,214,750.
For a proportion (or percent), a ratio where the numerator is a subset of the denominator, a slightly different estimator is used. If P = X / Y then the standard error of this proportion is approximated as:
If Q = 100% * P (P is the proportion and Q is its corresponding percent), then SE(Q) = 100% * SE(P).
Note the difference between the formulas to approximate the standard error for proportions (4) and ratios (3) - the plus sign in the previous formula has been replaced with a minus sign. If the value under the radical is negative, use the ratio standard error formula instead.
Example 3 - Calculating the Standard Error of a Proportion/Percent
We are interested in the percentage of females who have never been married to the number of people who have never been married. The number of females, never married is 41,142,530 and the number of people who have never been married is 88,337,406. To calculate the
approximate standard error of this percent, we need the standard errors of the two estimates in the percent.
From Example 2, we know that the approximate standard error for the number of females never married is 51,284 and the approximate standard error for the number of people never married calculated is 74,563.
The estimate is:
Therefore, using formula (4) for the approximate standard error of a proportion or percent, we have:
To calculate the lower and upper bounds of the 90 percent confidence interval around 46.57 using the standard error, simply multiply 0.04 by 1.645, then add and subtract the product from 46.57. Thus the 90 percent confidence interval for this estimate is:
We are interested in the ratio of the number of unmarried males to the number of unmarried females. From Examples 1 and 2, we know that the estimate for the number of unmarried men is 47,194,876 with a standard error of 54,126, and the estimate for the number of
unmarried women is 41,142,530 with a standard error of 51,284.
The estimate of the ratio is:
Using formula (3) for the approximate standard error of this ratio, we have:
The 90 percent margin of error for this estimate would be 0.002 multiplied by 1.645, or about 0.003. The 90 percent lower and upper 90 percent confidence bounds would then be [1.147 – 1.645(0.002)] to [1.147 + 1.645(0.002)], or 1.144 and 1.150.
The statistic of interest is a percentage change from one time period to another, where the more current estimate is compared to an older estimate, for example, the percent change of a 2017 estimate to a 2015 estimate. If the current estimate = X and the earlier estimate =Y, then the standard error for the percent change is approximated as:
As a caveat, this formula does not take into account the correlation when calculating overlapping time periods.
We are interested in the number of single unit detached owner-occupied housing units. The number of owner-occupied housing units is 75,022,569 with a margin of error of 227,992, as found in subject table S2504 (Physical Housing Characteristics for Occupied Housing Units)
for 2017, and the percent of single unit detached owner-occupied housing units (called "1, detached" in the subject table) is 82.5% (0.825) with a margin of error of 0.1 (0.001).
Therefore, the number of 1-unit detached owner-occupied housing units is:
Calculating the standard error for the estimates using the margin of error we have:
Using formula (6), the approximate standard error for number of 1-unit detached owneroccupied housing units is:
To calculate the lower and upper bounds of the 90 percent confidence interval around 61,893,619 using the standard error, simply multiply 123,102 by 1.645, then add and subtract the product from 61,893,619. Thus the 90 percent confidence interval for this estimate is
[61,893,619 - 1.645(123,102)] to [61,893,619 + 1.645(123,102)] or 61,691,116 to 62,096,122.
Users may conduct a statistical test to see if the difference between an ACS estimate and any other chosen estimate is statistically significant at a given confidence level. "Statistically significant" means that it is not likely that the difference between estimates is due to random
To perform statistical significance testing, first calculate a Z statistic from the two estimates (Est1 and Est2) and their respective standard errors (SE1 and SE2):
If Z > 1.645 or Z < -1.645, then the difference can be said to be statistically significant at the 90 percent confidence level.1
Any pair of estimates can be compared using this method, including ACS estimates from the current year, ACS estimates from a previous year, 2010 Census counts, estimates from other Census Bureau surveys, and estimates from other sources. Not all estimates have sampling error (2010 Census counts do not, for example), but when possible, standard errors should be used to produce the most accurate test result.
1The ACS Accuracy of the Data document in 2005 used a Z statistic of +/-1.65. Data users should use +/-1.65 for estimates published in 2005 or earlier.
Issues with Using Overlapping Confidence Intervals for Statistical Testing
Users are also cautioned to not rely on looking at whether confidence intervals for two estimates overlap in order to determine statistical significance. There are circumstances where comparing confidence intervals will not give the correct test result. If two confidence
intervals do not overlap, then the estimates will be significantly different (i.e. the significance test will always agree). However, if two confidence intervals do overlap, then the estimates may or may not be significantly different. The Z calculation shown above is recommended in all cases.
The following example illustrates why using the overlapping confidence bounds rule of thumb as a substitute for a statistical test is not recommended.
Let: X1 = 6.0 with SE1 = 0.5 and X2 = 5.0 with SE2 = 0.2.
The Lower Bound for X1 = 6.0 - 0.5 * 1.645 = 5.2 while the Upper Bound for X2 = 5.0 + 0.2 * 1.645 = 5.3. The confidence bounds overlap, so, the rule of thumb would indicate that the estimates are not significantly different at the 90% level.
However, if we apply the statistical significance test we obtain:
Z = 1.857 > 1.645 which means that the difference is significant (at the 90% level).
All statistical testing in ACS data products is based on the 90 percent confidence level. Users should understand that all testing was done using unrounded estimates and standard errors, and it may not be possible to replicate test results using the rounded estimates and margins of error as published.
Suppose we are interested in the total number of males with income below the poverty level in the past 12 months for the state of Wyoming. We want to find the estimate using both state and PUMA level estimates. Part of the collapsed table C17001 is displayed in Table A below.
First, sum the three state-level male age group estimates for Wyoming:
The approximation for the standard error for the summed state-level age groups is:
Next, sum the four PUMA estimates for males:
The approximation for the standard error of the summed PUMA level estimates is:
Finally, we will sum up all three age groups for all four PUMAs to obtain a third estimate of males:
The approximated standard error for the summed age-group PUMA level estimates:
We also know that the standard error using the published MOE is 3,309 /1.645 = 2,011.6.
In this instance, all of the approximations under-estimate the published standard error and should be used with caution.
Suppose we wish to estimate the total number of males at the national level using age and citizenship status. The relevant data from table B05003 is displayed in Table B below.
The estimate and its MOE that we are interested in are actually published. However, if they were not available in the tables, we could find calculate them. To find the estimate for the number of males, we would sum the number of males under 18 and over 18:
The approximated standard error is:
Another method would be to add up the estimates for the three subcategories (Native, and the two subcategories for Foreign Born: Naturalized U.S. Citizen, and Not a U.S. Citizen), for males under and over 18 years of age.
From these six estimates we find:
With an approximated standard error of:
We know that the standard error using the published margin of error is 27,279 / 1.645 = 16,583.0.
With a quick glance, we can see that the ratio of the standard error of the first method to the published-based standard error yields 1.24, an over-estimate of roughly 24%, whereas the second method yields a ratio of 4.07 or an over-estimate of 307%. This is an example of what
could happen to the approximate SE when the sum involves a controlled estimate. In this case, the controlled estimate is sex by age.
Suppose we are interested in the total number of people aged 65 or older. Table C shows some of the estimates at the national level from table B01001 (the estimates in gray were derived for the purpose of this example only).
To begin we find the total number of people aged 65 and over by adding the totals for males and females:
An alternate method would be to sum males and female for each age category. We could then use the MOEs for the age category estimates to approximate the standard error for the total number of people over 65.
With this method, we calculate for the number of people aged 65 or older to be 39,506,648. We approximate the standard error as:
For this example, the estimate and its MOE are published in table B09017. As such, we know that the total number of people aged 65 or older is 39,506,648 with a margin of error of 20,689.
Therefore the published-based standard error is:
The approximated standard error, calculated using six derived age group estimates, yields an approximated standard error roughly 3.6 times larger than the published-based standard error. As a note, there are two additional ways to approximate the standard error of people aged 65 and over in addition to the way used above. The first is to find the published MOEs for the males age 65 and older and of females aged 65 and older separately and combine them to find the approximate standard error of the total. The second is to use all twelve of the published estimates together (all estimates from the male age categories and female age categories) to create the SE for people aged 65 and older. In this particular example, the results from all three ways are the same; the same approximation for the SE is obtained regardless of the method. This result differs from that found in Example A.
This example gives an alternative to the methodology of Example C. Here, we derive the estimate and its corresponding SE by summing the estimates for the ages less than 65 years old and subtracting them from the estimate for the total population. Due to the large number
of estimates, Table D does not show all of the age groups. Again, the estimates in shaded part of the table were derived for the purposes of this example and cannot be found in table B01001.
To find an estimate for the number of people age 65 and older, subtract the population between the ages of zero and 64 years old from the total population:
Number of people aged 65 and older:
The SE approximation uses the same methodology as in part C. First, sum male and female estimates across each age category, then approximate the MOEs:
The SE for the total number of people aged 65 and older is:
Again, as in Example C, the estimate and its MOE are we published in B09017. The total number of people aged 65 or older is 39,506,648 with a margin of error of 20,689. Therefore the standard error is:
The approximated standard error using the thirteen derived age group estimates yields a standard error roughly 8.2 times larger than the actual SE.
Data users can mitigate the problems shown in examples A through D by utilizing a collapsed version of a detailed table or the less detailed annual Supplemental Tables. Using these tables, if available, may reduce the number of estimates used in the approximation. These
issues may also be avoided by creating estimates and SEs using the Public Use Microdata Sample (PUMS) or by requesting a custom tabulation, a fee-based service offered under certain conditions by the Census Bureau. For more information regarding custom tabulations, visit: https://www.census.gov/programs-surveys/acs/data/custom-tables.html.