Here’s one for the methodology nerds.
Jeremy Nalewaik is a staff economist at the Federal Reserve who specialises in macroeconomic statistics. His name first came to our attention earlier this year when we read this post at Freakonomics by Justin Wolfers, who used Nalewaik’s work to show that Gross Domestic Product and Gross Domestic Income had diverged markedly in recent years.
You might remember from your Macro 101 days that although GDP and GDI are conceptually the same, they use different source data and therefore spit out different numbers. Normally the separation between the two shouldn’t be too dramatic, but the last few years haven’t been normal — and we hadn’t realised the extent to which the two had drifted apart until we read the Wolfers post.
Why does this matter? Since the start of the recession, GDI has proved the more accurate depiction of US economic performance, according to Nalewaik’s work. As better data have become available and the Bureau of Economic Analysis (which calculates both) has accordingly revised its earlier estimates, it is GDP that has been adjusted in the direction of GDI rather than the other way round.
Have a look at this chart from Nalewaik’s recent paper:
Neither measure was perfect, but early GDI estimates were much closer than GDP to later revisions of both measures. Perhaps more tellingly, GDI started signaling an economic slowdown in the middle of 2007 even as GDP kept climbing. Early GDI estimates also turned out to better reflect the severity of the recession.
To give the most glaring example, the initial GDP estimate for the fourth quarter of 2008 showed that the economy contracted by 3.8 per cent. It was released on January 30, 2009 — about three weeks before Obama’s first stimulus bill passed. That number was continually adjusted down in later revisions, and in July of this year the BEA revised it all the way down to a contraction of 8.9 per cent.
There were many reasons for the changes — fuller explanations here and here — and it’s also the case that in the long run, initial GDP estimates have a reasonably good track record in the US compared against other countries.
But the point is that at a critical moment in the recession — a new president, politicians bickering over the scale of the response, financial markets nervous about the future of the banking system — there was a canyon-sized gap between economic reality and the measure that is generally accepted by markets, economists, and policymakers as the most meaningful reflection of it.
Had everyone simultaneously been monitoring GDI with the same focus, would anything be different now? We’ll leave that as an intellectual exercise for our readers and won’t get into counterfactuals, often a futile exercise anyways. But thinking about all this was enough to pique our curiosity and find out more.
So we got in touch with Nalewaik and asked him to explain GDI in more detail, and without the economist-y jargon you’ll find in his papers. He agreed to respond to our questions via email, and what follows is a transcript (lightly edited) of the lengthy back-and-forth we’ve had with him over the past couple of weeks.
FT Alphaville: For our readers who aren’t familiar with your research or with the methodology that goes into calculating US economic output, explain the basic differences between GDP and GDI.
Jeremy Nalewaik: GDP calculates the value of economic output essentially by adding up all the spending in the economy, while GDI calculates the value of output by adding up all the income generated by the economy’s activity. The data on spending and income come from a variety of different sources, and if the data were perfect, the two approaches would produce the same value for economic output.
Unfortunately, measurement errors affect much of the data, so the two approaches often give very different answers. My work suggests that the measurement errors in GDP have tended to be worse than the measurement errors in GDI, especially over the past couple of decades. So GDI, despite its relative obscurity, has been the more accurate measure of output growth more often than not.
When you say “measurement errors”, you mean that the raw data are flawed, right? Can you give us some examples?
Yes – the data are flawed. For example, the data released in real time are typically based on samples of businesses. The data from the samples may fluctuate in ways that have nothing to do with what is going on in the economy as a whole. And the data released in real time can sometimes have what statisticians call “non-sampling errors” — for example, they can be slow to pick up the effect of firms starting up and closing down.
You’ve also argued that GDI offers a better reflection of business cycle fluctuations, and that initial estimates of GDI turn out to be more accurate than those of GDP. Take us through the evidence and analysis that led you to these conclusions.
Speaking broadly, two types of evidence suggest that the initial estimates of GDI are typically better than the initial estimates of GDP.
First, a variety of business cycle indicators that should be highly correlated with output growth–including the Institute for Supply Management surveys, the change in unemployment, some financial market variables, and even GDP growth forecasts themselves–have actually been more highly correlated with GDI growth than with GDP growth in recent decades.
That last item on the list is worth emphasizing: economists forecasting GDP growth have produced median forecasts that have tended to be more highly correlated with GDI growth than the variable they are trying to forecast. It is also notable that GDI growth tends to predict GDP growth next quarter better than GDP growth itself. All this suggests initial GDI growth is picking up some real fluctuations in the economy that are being missed by the initial GDP growth estimates.
Second, initial GDI growth estimates have tended to predict revisions (typically years later) to initial GDP growth estimates, especially since the mid-1990s. So, if initial GDI growth is above initial GDP growth, GDP growth tends to revise up, and if initial GDI growth is below initial GDP growth, GDP growth tends to revise down.
For an example of the latter, just look to the recent recession. In March 2009, the Bureau of Economic Analysis announced that real GDP declined 0.8 percent from the fourth quarter of 2007 to the fourth quarter of 2008, while their GDI calculations showed a much more substantial decline of 2.1 percent. The latest Bureau of Economic Analysis estimates show declines over that time period of about 3 percent, using either measure.
So, while neither measure initially captured the full severity of the downturn in 2008, the picture painted by the initial GDI estimates was quite a bit closer to the revised figures (which incorporate more complete data and are generally assumed to be more accurate).
What is GDI telling us about the economy now vs what GDP is telling us?
GDI paints a less-bleak picture of the economy recently. In the first quarter this year, the latest real GDP growth estimates show annualized growth of 0.4 percent, while the latest real GDI growth estimates show 2.4 percent growth. Other economic indicators like the Institute for Supply Management surveys and the change in the unemployment rate were looking quite healthy over the first few months of this year, suggesting GDI was more accurate in the first quarter. In the second quarter this year, GDP growth is currently estimated at 1.0 percent while GDI growth is 1.5 percent, so GDI again looks better, but the difference is less sharp.
Surely neither approach is flawless, though. What are some of the disadvantages of GDI?
My research suggests that GDI, on average, provides a more accurate measure of output growth than GDP, but nothing I have done suggests that it is better all the time. At times, GDP has provided a more accurate signal than GDI. One major disadvantage of GDI is that the first GDI estimate is released a month or two after the first GDP estimate.
You write that there are times that GDP has provided a more accurate signal. What are those times, roughly? And more importantly, are there identifiable circumstances when we can predict that one measure will be more accurate than the other?
One example was 2006, when GDI initially showed stronger growth than GDP, but GDI ended up revising down toward GDP. However, this was atypical: since the mid-1990s, GDP has tended to revise toward GDI, not the other way around. It could go either way in any given year, but GDI is more accurate more often than not. When the two estimates diverge, it is often helpful to look at other indicators, such as those mentioned previously.
You included the use of GDI in your latest paper on stall speeds and recessions. Describe how the conclusions in the study differed when using GDP vs GDI.
The stall speeds paper looks at both GDP and GDI. The signals are cleaner, and the models tend to work better, with GDI than with GDP. This was just one of a long list of things I have tried over the years that work better with GDI than GDP.
What do you mean when you say that the signals are cleaner and the models work better?
For example, using the same data in the stall speed paper, from the second quarter of 1947 to the fourth quarter of 2007, 64 percent of GDI growth observations below 1 percent occurred in one of the four quarters prior to a recession, while only 48 percent of GDP growth observations below 1 percent occurred in one of the four quarters prior to a recession.
Do you sense that there is growing support within the economics profession for GDI to be considered alongside GDP as the primary measure of macroeconomic growth, or even to one day supplant it?
Macroeconomic data users, such as economists and financial journalists, have become increasingly aware of the magnitude and importance of the measurement errors in GDP growth. The experience over the latest recession has really increased the interest level here. So, it is natural that an alternative measure would start to get some consideration, and I think many economists have been impressed by the evidence favorable to GDI.
For example, the Business Cycle Dating Committee of the National Bureau of Economic Research now uses GDI as well as GDP and other monthly indicators in determining the official dates of business cycle peaks and troughs. They use GDP and GDI judgmentally, and also as a 50-50 weighted average, similar to the proposal I put forth in the Brookings paper for using such a 50-50 average as the official measure of U.S. economic output. I think this is a reasonable way to proceed, and there is ample precedent. For example, Australia has featured an average of different output measures since 1991.
FT Alphaville was recently having a broader discussion about the status of macroeconomic data-gathering in the US with a fellow blogger, Felix Salmon, and he made the point to us that it’s been in secular decline for the last few decades. Do you agree? Is this something that’s come up in your work on output measures?
Some evidence suggests that the measurement errors in GDP growth have become worse in recent years. This may have to do with the increasing importance of services in the economy in recent decades, a sector where the GDP source data has historically been spotty. This is because, historically, the U.S. Census bureau has not collected spending data for many types of services on a regular basis.
Despite budget constraints, the statistical agencies have mounted a major effort to improve their measurement of services GDP. However, even as they make progress, it is important to keep in mind that there will always be measurement errors of some kind or another in the GDP and GDI source data, so taking some sort of weighted average, as I proposed in the Brookings paper, would be the soundest approach.