climate models as opinions

Gavin Schmidt, a climatologist at NASA's Goddard Institute for Space Studies, writes on computational climate models:

In some respects they all act in very similar ways — for instance, when you put in more carbon dioxide, which is a green house gas, it increases the opacity of the atmosphere and it warms up the surface. That is a universal feature of these models and it is universal because it is based on very, very fundamental physics that you don't actually need a climate model to work out. But when it comes to aspects which are slightly more relevant – I mean, nobody lives in the global mean atmosphere, nobody has the global mean temperature as an important part of their expectations – things change. When it comes to something like rainfall in the American Southwest or rainfall in the Sahel or the monsoon system in India, it turns out that those different assumptions that we made in building those models (the slightly different decisions about what was important and what wasn't important) have a very important effect on the sensitivity of very complex elements of the climate.

Some models suggest very strongly that the American Southwest will dry in a warming world; some models suggest that the Sahel will dry in a warming world. But other models suggest the exact opposite. Now, let's just imagine that the models have an equal pedigree in terms of the scientists who have worked on them and in terms of the papers that have been published — it's not quite true but it's a good working assumption. With these two models, you have two estimates — one says it's going to get wetter and one says it's going to get drier. What do you do? Is there anything that you can say at all? That is a really difficult question.

There are a couple of other issues that come up. It turns out that if you take the average of these 20 models, that average is a better model than any one of the 20 models. It has a better prediction of the seasonal cycle of rainfall; it has a better prediction of surface air temperatures; it has a better prediction of cloudiness. That is a little bit odd because these aren't random. You can't rely on the central limit theorem to demonstrate that that must be the case, because these aren't random samples. They are not 20 random samples of the space of all possible climate models. They have been tuned and they have been calibrated and they have been worked on for many years — everybody is trying to get the right answer.

Let us assume that there exists a "true" deterministic computational model in the space of all possible climate models. This true model may be computationally very demanding and therefore outside the reach of current technology. Nevertheless researchers will continue to tweak their models and try to develop "better" ones.

I put the word "better" in quotation marks because it is practically impossible to develop a metric that will measure how close a certain model is to the true one. Why? Firstly, we can not observe the true model directly. Secondly, the success of a model's predictions may be due to sheer luck and therefore does not necessarily tell us how close the model is to the true one.(This is similar to the idea that there is always going to be some ignorant traders who constantly get the direction of the stock market right.) Thirdly, since climate models have long time horizons and their forecasts can not be thoroughly checked against new data within a practically viable period of time.

Schmidt states that the average model tends to perform better than each individual model does. Ignoring the above mentioned complications, we can interpret this as follows: The average model is closer to the true one.

The possibility that the average expert opinion may be superior than individual opinions was recognized quite early by the Rand Institute. During 1950s the so-called Delphi Method was developed to assess the effects of technology on future warfare. Here is a short description from Wikipedia:

"The Delphi method is a systematic, interactive forecasting method which relies on a panel of independent experts. The carefully selected experts answer questionnaires in two or more rounds. After each round, a facilitator provides an anonymous summary of the experts’ forecasts from the previous round as well as the reasons they provided for their judgments. Thus, experts are encouraged to revise their earlier answers in light of the replies of other members of their panel. It is believed that during this process the range of the answers will decrease and the group will converge towards the "correct" answer. Finally, the process is stopped after a pre-defined stop criterion (e.g. number of rounds, achievement of consensus, stability of results) and the mean or median scores of the final rounds determine the results."

The process sounds remarkably similar to the dynamics that govern scientific progress. Academicians read each other's articles, which get filtered through peer-review processes, and revise their own models in the light of the new information. They also interact in seminars and conferences where they exchange unpublished ideas.

But there are important differences as well. For instance, the anonymity assumption does not hold in academia. Researchers never hide their identities. This has two important consequences. 1)Ideas of academicians with greater credentials may have more influence on others. 2)Unconventional and non-mainstream ideas may never be uttered due to concerns about reputation. Moreover, the average opinion often has very little scientific value. Especially in hard sciences, the most successful approximations to truth are not the anonymous ones. (These fields are replete with results that are called by the names of their originators.)

On the other hand, academic fields with extremely complex subject matters (e.g. climatology and economics) often admit co-existence of different points of view about the same phenomenon. When the truth is complex, no single approach is likely to yield the ultimate explanation. In such cases, the average expert opinion may have some scientific value.

It is well-known that the average public opinion can outperform individual predictions including those done by experts. For example, prediction markets are exceptionally good at forecasting election outcomes and Oscar nominations.

(When it comes to making predictions in politics and economics, the experts perform no better than the average citizen does. Does this mean that we should stop wasting public resources on the production of such "experts"?)

How can you ever expect a large group of uninformed agents, which are occasionally swayed by herd mentality, to generate ideas that can beat even the "best" experts'? Well... The secret lies in two buzz words: diversity and independence. When individuals are anonymously comtemplating on their own and are not worried about the consequences of their ideas, the resulting spectrum tends to be greater than what would appear in a non-anonymous environment that is open to instant interactions. In the prediction markets, the ugly face of group dynamics does not kick in because people are not allowed to influence each other. That way, reputation and imitation are prevented from distorting the average estimate.

We have the two prerequisites for the well-functioning of the climate prediction market:

1) Independence. Each climate-modelling team develops its ideas partly in secrecy. This increases the statistical independence of their efforts. Teams do not operate anonymously. However this lack of anonymity should not be a big obstacle. Reputational concerns are not that great in climatology. The phenomenon in question is so complex and the time horizon of the predictions is so long that no "sane" model can ever be outright refuted.

2) Diversity. Climatologists are probably making use of all the tools at their disposal. Hence the spectrum of technologically available models are explored fully.

Note that I have used the words "model" and "idea" interchangably. This was done on purpose. Both models and ideas spit out a bunch of predictions against which their validities are tested. Assume that the source of the predictions is hidden inside a black box and you only get to see the predictions. Can you infer which predictions are generated by computational models? You can make a guess but you can never be sure. It is sort of like John Searle's Chinese Room:

"Searle's thought experiment begins with this hypothetical premise: suppose that artificial intelligence research has succeeded in constructing a computer that behaves as if it understands Chinese. It takes Chinese characters as input and, by following the instructions of a computer program, produces other Chinese characters, which it presents as output. Suppose, says Searle, that this computer performs its task so convincingly that it comfortably passes the Turing test: it convinces a human Chinese speaker that the program is itself a human Chinese speaker. To all of the questions that the human asks, it makes appropriate responses, such that any Chinese speaker would be convinced that he or she is talking to another Chinese-speaking human being."

Can you conclude that the artificial intelligence understands Chinese?

What about a bunch of random guesses which happen to overlap with the predictions of a sophisticated climate model? Do those guesses have any value at all? Is there more to a model than just an input-output scheme? Does the internal mechanism of the scheme matter? Faced with some of the most complicated systems in the universe, do "sophisticated" macroeconomics and climate models really have any substantial difference from mere guesses?

(For those who are interested, here is a paper that compares the pros and cons of the Delphi Method and the prediction markets.)

The wisdom of crowds and the success of collective intelligence have created a lot of headaches for auction participants. Regardless of what is being auctioned (e.g. a company, an oil well, a ton of tuna fish), it turns out that the average bid usually corresponds the correct value. Of course the winner is not the average bidder. Hence he is cursed from the beginning. The fact that he has won means that he has overpaid by bidding more than the average estimate.

Here is a short description of the Winner's Curse from Wikipedia:

"In a common value auction, the auctioned item is of roughly equal value to all bidders, but the bidders don't know the item's market value when they bid. Each player independently estimates the value of the item before bidding. The winner of an auction is, of course, the bidder who submits the highest bid. Since the auctioned item is worth roughly the same to all bidders, they are distinguished only by their respective estimates. The winner, then, is the bidder making the highest estimate. If we assume that the average bid is accurate, then the highest bidder overestimates the item's value. Thus, the auction's winner is likely to overpay."

Can we apply this idea to climatology? Some models receive more policy and public attention than others. Media, for example, usually focuses on the impact value of the news reported. Hence it is naturally biased towards picking and publishing papers that make attention-grabbing and exciting predictions. We can imagine this whole process as an auction. The research teams submit their models as bids. Models with unexciting predictions are considered as not news-worthy. (Even academic journals seem to be biased against such papers.) The winner of this auction is going to be the model that predicts the most dire scenario. What will be the Winner's Curse? It will be the reputational cost that will be incurred in the long run for creating a model that was far off the truth. Of course the verification process will take a very long time. By the end of it, all academicians involved in the production of the winning model will probably be dead. Hence, when discounted to today, the cost of reputational loss is not that much. Consequently, the participants in this auction will care very little about the winner's curse. (i.e. There will not be any shifts in the bidding behavior.)

What is considered by the media as attention-grabbing seems to be changing. Climate models predicting dire scenarios have become so commonplace that important newspapers are now providing space to sceptical views of these models.

P.S. The validity of a proposition rests partly on the nature of the methodology used. How about the validity of the methodology itself? The usefulness of the Delphi Method can not reached by an application of the Delphi Method...