In my last post, I explained how the IPCC attempts to use climate models to show humans have caused the recent global warming. Models are useful for testing scientific ideas, but they are not proof an idea is correct unless they successfully and accurately predict future events. See the story of Arthur Eddington’s test of Einstein’s theory of relativity here. In the computer modeling world, a world I worked in for 42 years, choosing one model, that matches observations best, is normal best practice. I have not seen a good explanation for why CMIP5 and CMIP6 produce ensemble model means. It seems to be a political solution to a scientific problem. This is addressed in AR6 in Chapter 1,[1] where they refer to averaging multiple models, without considering their accuracy or mutual independence, as “model democracy.” It is unclear if they are being sarcastic.

Figure 1. John Christy’s comparison of the CMIP6 models, their ensemble mean (red and yellow boxes), and two observation datasets in green. The light green is a weather balloon dataset, and the dark green is a weather reanalysis dataset. Each dataset is for the tropical mid-troposphere from 300 to 200 hPa, roughly 10 to 12 km in altitude, the so-called tropical model “hot spot.” Source: Dr. John Christy, the graph is after a presentation Christy gave to the Irish Climate Science Forum on January 22, 2021.

Figure 1 shows the CMIP6 (IPCC, 2021) or the IPCC AR6 models, their mean in yellow and red boxes, and observations in green. In this region of the tropical troposphere, often called the climate model “hot spot,” climate models have always overestimated warming.

AR6 discusses weighting the models according to their performance and their dependance upon other models, since many models share code and logic, but could not find a robust method for determining the weights. In the end, they classified the models based first on observations prior to 2014 and second on their modeled ECS (Equilibrium Climate Sensitivity to a doubling of CO2) and TCR (the Transient Climate Response to a doubling of CO2),[2] as discussed in AR6, Chapters 1 and 4.[3] These latter two values, as computed by the ensemble mean and ensemble members, were compared to ECS and TCR values determined independently of the models. The AR6 modeling process resulted in higher projected future warming than the already hot AR5. In AR6 chapter 4 they admit that much of the increase was due to the higher ECS and TCR values used in the AR6 assessment.

The IPCC, in AR4, AR5, and AR6, often conflate models and the real world, so constraining their model results to an independently predetermined range of climate sensitivity is especially worrisome. Models are the primary source for ECS and TCR, which are model-based estimates of climate sensitivity to CO2. They are artificial model constructs that cannot be measured in the real world, they can only be approximated. This makes their technique partially circular. Further, models are used to forecast future temperatures. Since the models run hot compared to observed warming, and have done so for over 30 years, model forecasts can be expected to be too high.

One reason they give in both AR5 and AR6 for using an ensemble mean is they think large ensembles allow them to separate “natural variability,” which they conflate with “noise,” from model uncertainty.[4] Thus, they use models to compute natural variability, with all the biases therein. Another reason is if two models come up with similar results, using the same scenario, the result should be “more robust.” Gavin Schmidt gives us his take at

“In the international coordinated efforts to assess climate model skill (such as the Coupled Model Intercomparison Project), multiple groups from around the world submit their model results from specified experiments to a joint archive. The basic idea is that if different models from different groups agree on a result, then that result is likely to be robust based on the (shared) fundamental understanding of the climate system despite the structural uncertainty in modeling the climate. But there are two very obvious ways in which this ideal is not met in practice.

1. If the models are actually the same [this happened in CMIP5], then it’s totally unsurprising that a result might be common between them. One of the two models would be redundant and add nothing to our knowledge of structural uncertainties.

2. The models might well be totally independent in formulation, history, and usage, but the two models share a common, but fallacious, assumption about the real world. Then a common result might reflect that shared error, and not reflect anything about the real world at all.”

Gavin Schmidt, 2018

In AR6, they acknowledge it is difficult to separate natural variability from model uncertainty. They tried separating them by duration, that is, by assuming that short-term changes are natural variability and longer-term changes are model uncertainty. But they found that some natural variability is multi-decadal.[5] Internal natural variability via ocean oscillations, such as the AMO[6] or the PDO,[7] have a long-term (>60 years) effect on global and regional climate.[8] These very long natural oscillations make it difficult to back out the effect of human greenhouse gas emissions.

Conflating natural variability with short-term noise is a mistake, as is assuming natural variability is short term. It is not clear that CMIP6 model uncertainty is properly understood. Further, using unvalidated models to “measure” natural variability, even when an attempt is made to separate out model uncertainty, assumes that the models are capturing natural variability, which is unlikely. Long-term variability in both the Sun and the oceans is explicitly ignored by the models.[9]

The CMIP models have a tough time simulating the AMO and PDO. They produce features that approximate these natural oscillations in time and magnitude, but they are out of phase with observed temperature records and each other. A careful look at the projected portions of Figures 1 (post 2014) and 2 (post 2005) will confirm this timing problem. Thus, when the model output is averaged into a multi-model mean, natural ocean oscillations are probably “averaged” out.

Figure 2. A comparison of the AR5 CMIP5 models (in various colors) and the CMIP5 model ensemble (in red) with weather balloon data in green. The data is from the tropical middle troposphere. Ross McKitrick and John Christy compare the models to the observations statistically, and find the difference is statistically significant. The data is from their 2018 paper (McKitrick & Christy, 2018), the plot is from John Christy.

The model results shown in Figures 1 and 2 resemble a plate of spaghetti. Natural climate variability is cyclical,[10] so this odd practice of averaging multiple models erroneously makes it appear nature plays a small role in climate. Once you average out nature, you manufacture a large climate sensitivity to CO2 or any other factor you wish, and erroneously assign nearly all observed warming to human activities.

The IPCC included many models in their AR5 ensemble that they admit are inferior. Some of the models failed a residual test, indicating a poor fit with observations.[11] The inclusion of models with a poor fit to observations corrupts the ensemble mean. In fact, as admitted by Gavin Schmidt in his blog post, two of the models in CMIP5 models were the same model with different names, which inadvertently doubled the weight of that model, violating “model democracy.” He also admits that just because different models agree on a result, the result is not necessarily more “robust.” I think we can all agree he got that right.

It seems that they are attempting to do “consensus science” and, for political reasons, are including results from as many models as possible. This is an admission they have no idea how climate works, if they did, they would only have one model. As Michael Crichton famously said:

“I regard consensus science as an extremely pernicious development that ought to be stopped cold in its tracks. Historically, the claim of consensus has been the first refuge of scoundrels; it is a way to avoid debate by claiming that the matter is already settled.”

Michael Crichton, January 17, 2003, at the California Institute of Technology

In Professor William Happer’s words:

“A single, incisive experiment is sufficient to falsify a theory, even if the theory accurately explains many other experiments. Climate models have been falsified because they have predicted much more warming than has been observed. … Other failures include the absence of the predicted hot spot in the upper troposphere of tropical latitudes.”

(Happer, 2021d, p. 6)

The “hot spot” that Happer refers to is the source of the temperatures plotted in Figures 1 and 2. McKitrick and Christy provide the details of the statistical climate model falsification Happer refers to in their 2018 paper. In summary, if the IPCC cannot choose one best model to use to forecast future climate, it is an admission that they do not know what drives climate. Averaging multiple inferior models does not allow them to estimate natural variability or the human influence on climate more accurately, it only produces a better-looking forecast. It is a “cosmetic” as we say in the computer modeling world. They will only be able to properly estimate natural variability with observations, at least in my opinion. They knew this in the IPCC first assessment report (FAR), but forgot it in later reports, in FAR they concluded:

“The size of this [global] warming is broadly consistent with predictions of climate models, but it is also of the same magnitude as natural climate variability. … The unequivocal detection of the enhanced greenhouse effect from observations is not likely for a decade or more.”

(IPCC, 1992, p. 6)

Most readers will remember that the famous “Pause” in warming started less than ten years later.

The bulk of this post is an excerpt from my latest book, The Great Climate Debate, Karoly v Happer.

The bibliography can be downloaded here.

  1. AR6, page 1-96 
  2. Transient Climate Response 
  3. AR6 pages 1-96, 1-97, 4-22 to 4-23, and 4-4. 
  4. (Mitchell, Lo, Seviour, Haimberger, & Polvani, 2020) explain a methodology for separating natural variability from model differences. See also Box 4.1 in AR6, pages 4-21 to 4-24 for a complete discussion of the problem. 
  5. (AR6 4-19 to 4-24) 
  6. Atlantic Multi-decadal Oscillation 
  7. Pacific Decadal Oscillation 
  8. (Wyatt & Curry, 2014) 
  9. (Connolly et al., 2021) 
  10. (Wyatt & Curry, 2014), (Scafetta, 2021), and (Scafetta, 2013) 
  11. (IPCC, 2013, p. 882) 



Andy May is a writer, blogger, and author living in The Woodlands, Texas; and enjoys golf and traveling in his spare time.  He is the author of two books on climate change issues and one on Kansas history. Andy is the author or co-author of seven peer-reviewed papers on various geological, engineering and petrophysical topics. He retired from a 42-year career in petrophysics in 2016.  You can find many of his posts on the popular climate change blog, where he is an editor.  His personal blog is