As more and more data exists in digital form only, UK government and businesses are waking up to the inevitable need for data preservation.
Saving data sets is easier said than done, so the newly founded Digital Preservation Coalition last week launched an online version of its handbook entitled Preservation Management of Digital Materials.
The problem with electronic data opposed to paper data is that you need technology to read it. However, such technology changes quickly so, unless efforts are made to preserve not only the relevant data but the technology to access and understand it, records or data sets can be lost forever.
When the 1901 census was recently put online it proved that old data draws a huge amount of interest. So much in fact, that the university computer crashed under the number of hits within days of the census going online.
The 1901 census had been preserved in paper format, which rendered data easy to understand and convert into electronic format.
During the 1970s, however, censuses were preserved in electronic format only and were unintelligible without a vital deciphering code that turns rows of fragmented figures into readable content.
Back then, neither were preserved and the census data was in danger of being lost before it could reach its 100-year publication date.
Information rescue
The National Digital Archive of Datasets (NDAD) recently acted on behalf of the Public Record Office, to rescue 1970s census information by digging through archives for the translation code and preserving it together with data and technology context.
Kevin Ashley, head of NDAD at the University of London Computer Centre, said: "Some estimates suggest that 95 per cent of British and American government records are produced digitally. They provide a vast resource of data and statistical information.
"In many countries large quantities of vital information have already been lost forever because archive services did not exist.
"Due to the rapid obsolescence of software and hardware, as well as the limited life span of digital media, data sets cannot be left to sit on shelves before being processed."
The government has come to terms with the necessity of digital data preservation and launched the Digital Preservation Coalition in February.
This independent body gives advice on digital preservation and includes members such as the British Library, the Public Records Office, University Research Libraries and the University of London Computer Centre.
Helen Shenton, head of collection care at the British Library, stressed that digital data is not just a concern to public bodies, but affects enterprise data in the same way. She argued that companies should manage their digital information before it reaches obsolescence.
Indefinite access
"The toxic waste industry must keep records of where waste is located," she explained. "Nuclear waste has a life span of half a million years, and its records need to be accessible indefinitely. Oil companies need to preserve records of exploitation fields. Sites that are inaccessible today may be of use in 20 years."
Marc Fresko, director of consulting services at Cornwell Affiliates, stressed that anyone in any doubt of the need for digital preservation should read the Digital Preservation Coalition handbook.
He explained that it educates on the access and management of preserved data and provides a usable source of definitive references. However, Fresko was not impressed with the pragmatic side of the book.
"If you want a step-by-step instruction manual, you won't find it here," he argued. "But then, you won't find it in ready-to-use form anywhere. Such is the early state of our understanding of digital preservation."
