The Fiddler's dilemma

Given a large number of meteorological stations some will be warming and some not. Thus by simply excluding cooling stations it is possible to manufacture a warming trend. Likewise, given a large number of possible ways to process data, some of those methods will create warming and some will create cooling. It is therefore also possible to manufacture a warming trend from random data by selecting the most advantageous methods to process the data.
But there is a catch – which I will call the “Fiddler’s dilemma” and this is that in order to fiddle the data this way, you need “natural variation” (ironic isn’t it – as they deny it exists) – but the more you filter the stations, or the data, or the ways to process the data, the more you lose this necessary natural variation and therefore your ability to apparently “honestly” fiddle the data.
When I started going on about the pause, I knew that sooner or later this “Fiddler’s dilemma” would strike. Because given a large set random set of meteorological stations, you could cherry pick stations to create a warming trend by constantly removing those that didn’t suit the kind of graph you wished to create.
However, as time went by, and the number of stations still left in your cherry picked set would grow smaller and smaller – but what is also very important, is that those left would be more and more like each other. In evolutionary terms, the population diversity would diminish down to almost nothing, so that like animal populations that lose genetic diversity the fiddler’s population of potential fiddles loses variability and becomes susceptible to chance adverse conditions.
So, whilst a huge population of available stations and techniques to process the data allows many different ways to fiddle a warming trend, if you use this approach, sooner or later the very act of weeding out those that don’t fit your preconceived ideas means that sooner or later it becomes impossible to keep the fiction going because the cherry picking data fiddler gets rid of the very variability they need for their scam.

An Example


If my logic is difficult to follow … think of it this way.
Let us suppose you are just some ordinary Mr Smut and for some daft reason you want to prove that everyone in a town is walking south.
Let us assume that on this particular day that there are 16,384 people, and at your first measurement – you ignore from your sample everyone walking North. Quite obviously, because your sample of people you are watching does not contain anyone walking North … you can triumphantly proclaim that the whole of your sample is walking south.
Assume this leaves 8192. At each successive sample, if we assume the direction of travel is either North or South and random, Mr Smut will discard as “obviously bogus” any not going south. If this is half of each sample, then each time Mr Smut has 4096, 2048, 1024, … until after 7 measurements he has only 128 people.
However …. because all these people have been sharing precisely the same behaviour, and because people are not random, it is very likely that many of these people are walking in the same direction for the same reason. Their behaviour is not random, and worse, by repeatedly selecting for a uniform kind of behaviour, you chosen people that will very likely have the same reason for going in the same way. So, it is very likely that many of them are in groups or even one large group. So although when Mr Smut has around 128 individuals still left in his sample, he may think he can continue the con for another 7 samples, the chances are that large numbers in this highly selected groups are related in some way, so that whole groups will drop out at the same time, So they will drop out much faster than expected and the time the scam can continue will be much shorter than expected.
Going back to temperature data, what you will tend to find is that although the samples are from diverse areas, they may all be impacted by events like El Nino in the same way. Indeed they may even be reflecting something entirely unrelated to climate.It may be a similar farming practice or building. It could even be something entirely unexpected. For example, some insects just love temperature sensors and make them their home. There body heat, whilst small, is enough to raise the temperature slightly. That’s great so long as the numbers of that species is increasing – and you’re selecting sensors that are just ideal homes for them by your “natural selection” process. But once you’ve removed all sensors except those that are an ideal home for some insect – and if that insect’s numbers then crash – you’ve got the same affect across a large number of instruments. And because your cherry picking technique naturally picked sensors that were ideal homes for insects – when this fact works against you – you only have sensors that suddenly drop in temperature because the insects all die out.
A similar argument can be used when picking from a large number of possible ways to “process the data”. By pure chance many will be able to manufacture warming … but those that do distort the data in the same way will tend to be related in some way.  So eventually, whilst you still have a large number of possible ways to process the data … you’ll find they all tend to come to the same result.
And sooner or later, even if the sample appears random, even if it still appears that there are many ways to reprocess the data to manufacture warming out of a random sample … the fiddler’s dilemma hits – because the necessary variation within the sample, within the data processing methods which the fiddler relies on to cherry pick the result … by the very action of being cherry picked … has been removed.
The crash in diversity and therefore ability to fiddle the data is always much faster than expected.

The Modeller’s dilemma

There’s a similar issue for any modeller – this time not implying any wrongdoing by the individuals, but instead an issue with variation and how it is wrongly incorporated into ensemble forecasts.
The way weather and climate models are run is to create an ensemble of starting conditions, each varying a little for all the others. These are then crunched through a big computer and supposedly each explores a different part of the variability space.
However there is a problem. And to explain, think about a large number of people dropped off in one side of a city having to make their way south. If their route were truly random and independent, then one would expect that their spacing would remain random.
That might be true on an open plain, but a city has constraints called “buildings” forcing us to use things called “roads”. Thus, although two individuals may start separated by quite some distance. They may find that they both use the same route to go south.
However, once they start travelling the same route – if they use the same way to calculate the best route (akin to using the same physics) – they will always continue on the same route as each other. Thus once they come together, they stick together**.
Thus, although we start with individuals randomly spaced, some start to come together and use the same pathways. If we then repeat the procedure mile after mile – first individuals will come together. Then these pairs will join with other pairs. And if we then have a city with a river and bridges – which very much restrict the routes anyone can take. After these obstacles, everyone will be in one of a few groups.
Thus the result of using an ensemble forecast where randomness is injected in at the beginning is only to explore a very small part of the “variability space”. In our model of a city, the modeller are only exploring the  “main routes”  – because everyone following the same a prescribed “physics” of how to navigate a city – even if starting off in very different locations, will tend to congregate on the “thorough fares” .
If however we want to explore the “backstreets” – then we need to introduce variability into the route taken as we go along. For a human that’s a “spur of the moment” decision to stop doing what we “should be doing” and do something else. That enables us to get off the thoroughfare and investigate the backstreets.
And that may not seem too important – when all “backstreets” are off the thoroughfares, but if we find a city, where for example, all the routes across a river are by small “footbridges” – if we only travel the main highway, we may miss not just the other side of the river, but arguably a much bigger part of the city (variability space) that can only be reached by diverting off the thoroughfares of the climate model.

The Consensus Dilemma

Indeed … if you run science by “consensus” … you may very well run the same risk. People will tend to congregate to the “consensus” which in our model means they only research the “thoroughfares” and never venture off into the backstreets. Because by definition, the consensus is on the thoroughfare and going down a backstreet is moving away from the consensus.
That will not be important if the backstreets lead nowhere … but you’ll miss very important new areas of science if they do.


**The problem is particularly bad in computer models because a lack of diversity may not appear obvious where a factor is a combination of a large number of parameters. This is because each parameter is behaving like the position of an individual. If we average their position, after a while, many are constrained and no longer random – thus the amount of randomness or the amount of the variability space is very much constrained, however if there are still enough individuals with some randomness, the overall result may show no no obvious loss of randomness – even though the model is only exploring a very small fraction of the potential (weather) outcomes.

This entry was posted in Climate. Bookmark the permalink.