shoreham

November 24, 2009

The state of the databases is possibly why there has been a certain amount of reticence to publish raw station data - it is just not available in anything like a publishable form, since the code that has accrued over the years is not up to reconstructing the data, and the appalling state of documentation means that do do so will require major forensic reconstruction.
Gavin Schmidt states on RealClimate that he mostly does his own coding (in FORTRAN). It is the way within academia that without formal training, computers have become tools that academics use to produce their own data. Sometimes that goes beyond Excel, or Filemaker Pro on a Mac (or god forbid, Hypercard) and the old mainframe languages like FORTRAN or COBOL (in science or economics, in medicine, M (MUMPS)code is an old favourite, found throughout the NHS) are the only way to access earlier databases, the data often hard coded so that the data is not portable.
That is the problem facing "Harry" the programmer who has been trying to port the legacy CRUtem 2.1 data over to version 3.0, so it will run on Linux and Sun Alpha systems, with some code segments in FORTRAN77, other code in FORTRAN90. Here's an example of his frustration from HARRY_READ_ME.txt
That is one unhappy tale, and the 700Kb textfile describes three years of grief for "Harry". It also indicates what a mess the data is really in, and however much the CRU have spent on computer systems and model making, they need to spend more on teams of data analysts who can clean up the mess they have allowed to build up, instead of leaving it up to one individual.
They also need to honestly reassess the confidence they have in the long term record, including the instrumental station record, before making proclamations on how much we may expect to warm in the future. If the data that feeds the climate models is only slightly flawed, the output is absolutely worthless.

As I pointed out in an earlier post this is the crux of the whole matter

There is nothing intrinsically wrong with legacy architecture or writing programs in the old programming languages such as FORTRAN so long as proper attention is paid to system design, code reviews, testing, documentration etc. Throughout the cold war much of the nuclear defence of the Western world depended on the early warning systems that largely ran on mainframes running COBOL and old CODAYSL IDMS databases all of which are extremely reliable if used properly. The real issue here is how the raw data was vetted, evaluated. stored and modified over time. It is clear that this has been done in a haphazard and uncontrolled manner with almost no configuration management or audit trails. This means that unless the original source readings or data files have been retained there must be doubts over its reliabilty, particulary as it seems from 'Harry's' comments that the basic structure of the original database was not underpinned by the proper use of referential integrity or primary and foreign key constraints. As a consequence it is quite easy for data in different tables or files to become corrupted or out of sync. If the data is shot then it does not matter how brilliant are the rest of the coding alogorithms since once GIGO takes effect all results must be regarded as suspect.

Gordon Manley must be spinning in his grave.

November 23, 2009

Back to the real world: -
http://rawstory.com/...oto-scientists/
A society based on waste, greed and consumption - deserves a demise.

All very Old Testament but the harsh truth is that the Universe does not care whether we are virtuous or not.

It is true that Anthropogenic Global Warming due to mans plundering of global resources and reckless consumption of fossil fuels could raise sea levels, cause catastrophic, droughts, floods etc which wipes out our civilisation. It is equally true that Global Warming due to non man made factors would do the job just as well. The history of this planet is littered with extinct species killed off by environmental changes over which they had no control and which they played no part in creating. The human race could be doomed no matter what it does.

November 22, 2009

Good post above by Iceberg about how some scientists may want to hide aspects of their work from the public because of the likelihood of some of the sceptics taking it out of context to grind axes. There are certainly issues with that, but I think it's a better reason for it than the fight to protect intellectual property to the max, and illustrates that some of the sceptics are merely out to undermine the climate scientists. Which is a shame, because many other sceptics are sceptical for perfectly good reasons and are happy to play an active role in furthering the debate.

Surely that raises the whole debate about whether intellectual property rights and pure science are really compatible.

If the research is robust it should stand on its own merits under open peer review so any weaknesses or faults can be spotted and weeded out. It is the whole argument behind open source software development. The more people who can carry out the verification then the less risk that vested interests of whatever ilk can hijack the debate.

November 22, 2009

I have been following this story on a number of weather and news boards.

From what I can determine the most damaging item in the entire collection of hacked (or leaked ?) files is the HARRY_READ_ME.txt document. It is far more revealing than the emails about the state of research at the CRU.

This is a 194 KB text file describing in detail (including code extracts and lists of data readings) the trials and tribulations of the eponymous 'harry' trying to produce a new master database of readings from various weather stations. Despite the fact that the old master database appears to have been full of corrupt data it was still used as the basis for validating and incorporating new readings. As someone who works in IT this document reads like the real deal to me (ie if it is a forgery or has been amended then someone has gone to get the feel right). In some circumstances it looks as though the CRU has lost its original source material and had to reconstruct it. Given this situation it may explain why the they have been reluctant to provide its records for open peer review since that would expose many of the faults and inconsistencies. If this data was used as input to predictive models I would regard any output as being highly dubious.

Normally this sort of thing would just be a small storm in academia. Unfortunately, this is simply too important an issue with implications for housing, food, energy, economic and social policy effecting billions of people to allow this type of ropey data vetting to pass unchallenged. As so often happens in the world of learning I suspect that questions of jobs, careers, research grants, desire for acceptance by ones colleagues etc is playing a significant and not always benign role in how studies are carried out.

Sign In

shoreham

Posts

Joined

Last visited

Content Type

Forums

Blogs

Gallery

Events

Learn About Weather and Meteorology

Community guides

Posts posted by shoreham

CRU E-mails and data

CRU E-mails and data

CRU E-mails and data

CRU E-mails and data

Netweather

Activity

Others