Latest News Editor's Choice


Technology / Computers

How to Future-proof your data archive

by Dennis O'Reilly
31 Dec 2010 at 04:46hrs | Views
It's easier than ever to make sure copies of your most important records, documents, photos, videos, and other personal data will be readable/viewable/playable long after the hardware and software used to create the files have bitten the dust.

The four keys to safe data archiving are to choose file formats that won't become obsolete, use storage media that won't deteriorate or become inaccessible, make multiple copies stored apart, and check your archived data regularly to ensure it's still readable.

Don't get stuck with outdated data formats
Most of the files you want to archive are likely in proprietary formats, such as Microsoft Office's .doc, .xls, and .ppt for Word, Excel, and PowerPoint, respectively. Despite the ubiquity of software and services that let you read and edit Office files without the Office app used to create them, these formats will become obsolete one day--perhaps sooner than you may think.

Even if you archive files in their original proprietary formats, it's a good idea to save another version of the files converted to an open-standard format. That's the approach taken by the open-source Archivematica data-archive service, which maintains the original file format but also converts the files to appropriate "preservation" and "access" formats.

For example, Archivematica's media-type preservation plans convert .doc, .rtf, and .wpd word processing files to the XML-based Open Document Format (ODF) for preservation and to Adobe's PDF for viewing. Likewise, the system saves .bmp, .jpg, .jp2, .png, .gif, .psd, .tga, and .tiff raster image files as uncompressed TIFFs for preservation and as JPEGs for viewing.

Archivematica media type preservation plans

The Archivematica media type preservation plans save archived files in their original format as well as in preservation and access formats.



The Archivematica system is still very much under development, but you can download the alpha version of the free archiving software for use on a virtual appliance, Live USB key, or Live DVD.

To recap, the best way to ensure your archived files will be readable while maintaining their original formatting and other attributes is to save them in their native format and in at least one other generic, open format. This lets you open, view, and edit the files in the program used to create them if it's available, and access the data in a more basic form if the proprietary software isn't available.

Find a storage medium with legs
If you're wondering how long the data on your CDs and DVDs will last, you're not alone. Even the experts can't agree on the expected longevity of optical media--and the same is true for magnetic tapes and disks. (The X Lab offers a detailed discussion of optical media longevity, including a brief description of the ISO standards for testing optical media.)

The general consensus is that CD-Rs should last 30 to 50 years, DVD-Rs less than that, and CD-RWs and DVD-RWs even less. Similarly, tapes and hard disks can be expected to be readable for 10 to 30 years, while portable disks, USB thumb drives, and other solid-state storage devices may survive for half that time, maybe.

But these are just numbers. Who wants to trust their important data to probabilities? The facts are that any storage medium can fail at any time. That's why you should archive data on more than one medium and check your archives regularly for failures (more on these points below).

Two of my favorite archival-storage options are the oldest and the newest: paper and online, respectively. While printing your archives isn't environmentally friendly, it's tough to beat the expected lifespan of properly stored paper records. Of course, finding specific files in a paper archive can be a challenge, and paper records aren't easy to convert.

If searchability and easy accessibility are important, online data archives are a good choice. Services such as SpiderOak and Microsoft's SkyDrive make it easy to store copies of your important files in the cloud where they can be retrieved from any browser. (I described SpiderOak and two other free encrypted online storage services in a post from June 2009.)

More than one archive, more than one place
Storing your data archive online violates two rules of safe storage: you don't have physical access to the hardware the files are stored on, and you're susceptible to the financial health of the service you're using. If the service goes under, there's no guarantee you'll be able to retrieve your data, and you have to trust in the service's ability to maintain and back up its storage servers.

The key is to avoid putting all your archival eggs in one basket. Use a combination of media and file types when archiving important data to increase the chances that the information will be accessible well into the future. And as new archival media are developed and proven to be practical, convert your archive to one of those forms.

One other very important archival-storage rule: always store at least one copy of your archived data somewhere other than your home or office. This is one area where online storage comes in handy.

Schedule regular archive checkups
For the last couple of years I've been spending a good chunk of my spare time using the free Audacity audio software to convert several thousand songs on several hundred audio cassettes to MP3s. Some of the tapes were made as far back as the mid-1970s, but most date from the 1980s and early 1990s.

Most of the store-bought tapes in my collection are now unplayable, but I've had much better luck with the home-made recordings. In fact, one tape I made in 1976 spent a good chunk of its early life stored in my old VW bug--through freezing temperatures and blistering heat--but it sounded brand new when I converted its songs to digital (thank you, TDK!)

When it comes to your important digital files, you can't trust to luck. Get into the habit of opening a handful of files in your various archives on a regular basis. If they aren't accessible, dig out one of your other backups and make another archival copy using known-good versions of the files. After all, anything worth saving is worth saving well.

Source - news.cnet.com