Or, why national accounting issues threatened to send an econo-blogger into a nihilistic vortex of swirling epistemological doubts.
Last week, introducing our interview with the Federal Reserve’s Jeremy Nalewaik about the flaws in measuring economic output, we said that counterfactuals are often a futile exercise.
But not always. Here’s what we said:
…the initial GDP estimate for the fourth quarter of 2008 showed that the economy contracted by 3.8 per cent. It was released on January 30, 2009 — about three weeks before Obama’s first stimulus bill passed. That number was continually adjusted down in later revisions, and in July of this year the BEA revised it all the way down to a contraction of 8.9 per cent.
Felix Salmon described this as “an epic and very expensive fail”, as Congress would have pushed for a larger stimulus had it known the true severity of the economic contraction.
To which Megan McArdle responded:
Indeed, to the extent that you think confidence matters, the excessively low GDP estima tes might have helped rather than hurt; 9% would have scared the hell out of everyone, but I doubt it would have moved substantially more money out of congress. A hundred billion, maybe–barely measurable impact, really. But the Democrats weren’t going to be the first party to spend a trillion dollars in one package–not least because the GOP wasn’t going to let them.
So while I think that we should pay a lot of attention to having the best statistics we can, I doubt it mattered in this case. The economy was in free fall, Congress was pretty well aware of that, and they spent $800 billion.
It’s a smart take, but we have two responses.
The first is to simply acknowledge that she could be right. (Ezra Klein makes the same point in his long WaPo article from this weekend.) No matter how sharp a decline in GDP that quarter, it’s quite possible that any stimulus with a price tag north of $1 trillion would have remained politically unpalatable. Christina Romer told the President that the economy had a $2 trillion hole and recommended a $1.2 trillion stimulus. We got $800bn. Would a projected hole of $3 trillion have pushed out the boundaries of political feasibility? There’s no way to know, but it’s reasonable to think not.
(And by the way, we’re not here to rehash arguments about whether or not the stimulus was a good idea: right now we’re just interested in whether the outcomes might have been different, not better or worse.)
A foggy look back
But our second response is that perhaps this is too narrow a lens through which to look back on these events, especially if our focus is on the failure of macroeconomic statistics.
Here’s a chart that Nalewaik recently sent us — a more recent version of the one we used in our original post about the relative merits of GDP and GDI, and which he plans to include in an updated version of this Brookings paper:
Look first at GDP (the red lines). As late as December 2008, GDP was showing that the economy had continued to grow through the second quarter of that year. Now look at the final revisions.
The data didn’t just fail in that crucial fourth quarter of 2008. They failed to warn that the economy had grown more slowly throughout 2007, had stopped growing in the fourth quarter of 2007, and had begun a steep decline by the third quarter of 2008. And that steep decline, hardly captured at all in the estimates released in December 2008, was from a lower base than was believed at the time.
Now turn to GDI (the black lines). There are meaningful problems with the source data of GDI and other reasons to be sceptical, which we’ll explain in a minute. But a look at the responses section of Nalewaik’s Brookings paper reveals that mainstream economists at least think highly enough of it as a complementary measure to give it serious consideration.
And even looking at the earlier estimates, GDI detected the economic deceleration far earlier — and the later, revised estimate spots it as early as the first quarter of 2007.
To get some sense of the dichotomy between reality and what we knew, we had a quick look back at some of the FOMC statements from the time. There is an intensely sharp contrast between what the Fed thought was happening and what we later discovered was really happening.
The statement released on 31 October, 2007, though expressing concern about growth moderating because of the housing correction, opens with: “Economic growth was solid in the third quarter, and strains in financial markets have eased somewhat on balance.” The statement from early December 2007 expresses almost polite concern that growth could slow, and seems to think that the reduction in the Federal funds rate to 4.25 per cent might actually be enough to “help promote moderate growth over time”.
Less than two months later, at the end of January 2008, alarm bells suddenly started ringing and the Fed made a surprise 75bps cut in response to pressures in short-term funding markets. By April 2008 the Fed funds rate would be down to 2 per cent.
But that doesn’t mean the full scale of the problem was finally, or fully, appreciated. As late as July 2008, Ben Bernanke actually believed that the rate reductions and a measly $100bn from President Bush’s stimulus bill targeted at consumer incomes might be enough to get the economy moving again. In the August 2008 statement, the FOMC even chose to hold the federal funds rate at 2 per cent. Throughout much of this period the Fed continued to cite inflationary pressures (driven by the commodities spike) as an equal or sometimes even greater worry than the possibility of a contraction.
We chose to look back at the FOMC statements only because it makes for an easy history lesson. We know GDP isn’t the only relevant indicator and the Fed isn’t the only relevant policymaker. (Although it has felt like that at times.) But this quick tour does show that the magnitude of what was happening had mostly escaped those responsible for monitoring economic growth.
The issue, we suppose, is whether we’d have ended up with different policy or markets outcomes if everyone had recognised that the economy started to significantly decelerate much earlier than we actually did.
But we actually think that the complications involved in thinking about this are probably too big to contemplate. Maybe the original Bush stimulus package would have been bigger. Maybe monetary policy would have been more aggressive much earlier (this seems especially likely to us, given Bernanke’s double-expertise in the Great Depression and Japan). Maybe markets would have freaked out even earlier and banks would have come under pressure sooner. Maybe the economy would have been the focal 2008 issue much earlier (it wasn’t trivial, of course, but if we recall correctly it didn’t really intensify until after Lehman). This is before we even get into the international dimensions. Maybe inflation expectations would have plummeted — and maybe actual inflation would have declined earlier as demand for commodities fell. Maybe the very knowledge that things had slowed down earlier would have produced a confidence shock that would have precipitated the kind of Q4 2008 fall that came later.
In other words, for all we know, the world in January 2009 would have been so drastically different from the one we ended up with that it makes little sense to think of it in terms of just one quarter’s reading.
And all we’re trying to say, really, is that having more timely, more accurate information will likely produce better outcomes. Try being an ER surgeon dealing with a heart attack when the ECG results won’t arrive until two weeks later.
What really bakes our noodle, though, is to think that such a data failure could well happen again, or could be happening now. Because, eventually, everyone who traffics in this kind of real-time data confronts the reality that no matter how flawed the number, a) the media will report it, b) market participants, human and non-human alike, will trade on it, and c) policymakers will act on it.
Which is also why it’s unhelpful to fall into the above-mentioned vortex. We all have to work with something — and surely our readers in particular wouldn’t want us to abandon the markets entirely to the pure technical guys. Another reason is that it would mean our having to quit and get a proper job, and we’re not qualified for much else.
We won’t get much into the ways journalists could do a better job of following all this, as it all seems rather obvious. Be more sceptical; examine secular longer-term trends alongside the minutiae from real-time releases; use Bayesian probabilistic thinking (Felix’s idea); look at deltas rather than absolute levels where appropriate, and the like.
Otherwise, are there identifiable circumstances in which real-time growth data are more likely to fail? Has it improved recently? What can be done? Again we don’t know — we’d like to look into it, but if our stealthy readers know of a place to start, please get in touch.
Back to GDI, and the issue of inputs vs ex post data-mining
Nalewaik’s argument for GDI is based on its merits relative to GDP, not its individual excellence as a measurement. He’s happy to admit as much, which is why he makes the case for a weighted average of the two.
And in his response to Nalewaik’s paper, BEA director Steven Landefeld pointed to a few specific examples of the problems with it. We didn’t see this until after we posted our interview, but it’s worth providing a brief sceptical account.
One problem is that GDI includes certain kinds of data that are vulnerable to all manner of accounting trickery — especially for corporate profits, interest and rental income — and which often get revised later. It also relies on extrapolations of trends even more than initial GDP estimates do. Another is that while GDP may understate cyclical swings, there’s a chance that that GDI overstates them — suggesting that maybe GDI better captured the severity of the last recession because it tends to exaggerate things anyways.
We’d also note that Nalewaik made a case to us that GDI was better correlated with other indicators — unemployment, ISM survey, et cetera… Given that these have so many problems of their own, that’s less than overwhelming logic.
None of this is any reason at all to dismiss Nalewaik’s work. For one thing, he could be right! For another, Landefeld writes that it has spurred the BEA to reconsider some of its methods. And this part of Landefeld’s remark was somewhat heartening and also, we suspect, an admission of GDP’s biggest flaw:
Some of the measurement concerns raised in this paper about the ability of GDP and GDI to fully capture changes in the economy over the business cycle are in the process of being resolved, thanks to new quarterly source data on services from the Census Bureau and more comprehensive monthly data on wages and salaries from the Bureau of Labor Statistics.
So to the extent that Nalewaik’s work leads to incremental progress, well, that’s better than none. All we can tell from reading through the paper is that both the source data and the results have errors — and that those errors were kind of a big deal during precisely those years when we needed good information.
But the more fundamental point here is that Felix was probably correct when he wrote that neither of these measures is all that reliable — or least that’s how it seems to us.
What we have is a situation where the quality of real-time data can only be assessed with a huge amount of hindsight — the inputs are bad enough that only by scrutinising the outputs years later do we have some sense of whether a report is meaningful. Ideally, of course, you’d see vast improvements in the collection of the source data itself — or the identification of better source data.
Maybe it will happen. We don’t know.
But right now, work such as Nalewaik’s is probably the best we can do — and let’s hope that looking back in this way will turn out to have been a precondition of finding new ways to move forward.