Take Control of Mac OS X Backups: Part One

Editor’s Note: The following article is an excerpt from Take Control of Mac OS X Backups (version 1.1), a $10 electronic book available for download from TidBits Electronic Publishing. In the first part of this two-part excerpt on backup strategies, author Joe Kissell looks at the different approaches you can take, examines what duplicates and archives are, and explains why an ideal backup plan includes both.

I know a number of people who have made decisions about how to back up their computers based on what hardware or software they already own. Others buy a product that’s received good reviews and then figure out how to use it for effective backups. I believe these approaches are backward. If your data and your time are truly important, it makes sense to think about your needs first, then develop a strategy based on those needs, and finally choose hardware and software that fits with your strategy.

After the first edition of this book was published, several readers commented that the strategy I suggest here, while perfectly reasonable, may be inappropriate for “low-end” users because it presumes a significant expenditure of money and effort. Less-advanced users, the argument went, just want a backup system that’s inexpensive, easy-to-use, and effective. Don’t we all! Unfortunately, there is no such thing. You know the old saying: “Cheap; good; fast—pick any two.” The same goes for backups. I can tell you how to do them effectively or how to do them quickly and cheaply, but the less time and money you’re willing to spend, the less safe your data will be.

With that in mind, I want to begin this strategy section with a quick, high-level overview of several approaches you might choose to take, depending on your tolerance for cost, effort, and risk.

Backup Approaches

Major Objective Suggested Approach Risks and Trade-Offs
Saving Money • Hardware: Your Mac’s built-in SuperDrive.
• Software: Impression ($25)
• Strategy: Scheduled weekly duplicates and daily archives stored on DVD-RW or DVD+RW.
• You will not have a bootable duplicate, making it more difficult to recover after a hard drive failure.
• You must be present when backups occur to swap media.
• Restoring files from an archive will be time-consuming.
Ease of Use: Approach A •Hardware: A single Maxtor OneTouch FireWire drive.
•Software: Retrospect Express
• Strategy: Just press the button for instant (duplicate) backups whenever you wish.
• No archives to protect you against file changes and deletions, unless you set up such a script manually.
• Without redundant, off-site media, you risk data loss due to theft, fire, etc.
• You must remember to press the button.
Ease of Use: Approach B • Use an Internet backup service such as BackJack, which provides its own software and requires no hardware. • No bootable duplicates.
• Extremely expensive if you archive all your files; significant risk of data loss if you do not.
• Your data is unavailable if you lose Internet connectivity.
Data Safety • Hardware: Three external FireWire drives.
•Software: Retrospect Desktop.
• Strategy: Scheduled weekly duplicates and daily archives, alternating among drives; one drive always stored off-site.
• Optional: Archive mission-critical and active files frequently to your iDisk or an Internet backup service.
• Significant hardware and software costs.
• Learning curve to set up and use Retrospect software.
• Inconvenience of moving drives around each week.

While the approaches I outline are just a few examples of the many paths one could take to performing backups, I personally feel the importance of protecting your data trumps all other concerns. Therefore, in the above table, I highlighted the Data Safety approach in red, because I believe it is the best approach for the majority of readers. If your data—your e-mail, one-of-a-kind digital photographs, important documents, and so on—is not worth some time and money to you, then you probably don’t need backups. Keep in mind that you get out of a backup system what you put into it.

Do You Need Duplicates?

Let’s begin by assuming you have original (CD-ROM or DVD-ROM) copies of your operating system and all installed software. Now consider this question:

If your hard drive suffered a complete failure, how much time could you afford to spend restoring it to working order?

If you use your computer to run a business, do your homework, or trade stocks, for example, your answer may be “a few minutes at the most.” If no critical projects depend on a functional computer, you may be able to afford several days to restore it after a failure. Most of us are somewhere in between.

In the best case, it will take you several hours—and possibly a day or more—to reinstall a typical set of software onto a new or reformatted disk. However, if you do not have original copies of all your software, if you have a large number of third-party applications, or if you’ve customized your computer extensively, returning your computer to operation could take much longer.

The more you need to avoid that potential loss of time, the more you need to maintain duplicates (for more info, see “The Duplicate” section ahead).

Do You Need Archives?

Regardless of your need for duplicates, consider your answer to this much different question:

If your computer were stolen, how difficult would it be for you to live without the data on it?

Do you have years of bank records, email, poetry, academic papers, photos, movies, and so on stored on your computer? If so, chances are your answer is “extremely difficult.” On the other hand, if you use your computer only for casual Web surfing, playing games, and listening to music, living without the data on your computer may be nothing more than a minor inconvenience.

Many people, when asked what one item they would try to save if their house were burning down, would answer “my photo album”— because furniture can be replaced, but memories cannot. The same thing is true of the memories stored on your hard disk in the form of messages, graphics, and other documents you’ve created—not to mention all the pictures you’ve taken with your digital camera. Although hardware and software can be replaced, data cannot. And keeping your photo album on the computer only makes it that much more important to back up your data safely.

Although a duplicate includes a copy of your data, an archive includes many different versions of your data, making it much more likely that you’ll be able to retrieve the information you need in the event of a problem.

The greater the amount of personal data on your computer—and its importance to you—the greater your need to maintain archives (for more info, see “The Archive” section on the next page ).

Though there may be some exceptions, the ideal backup strategy for most people consists of both duplicates and archives.

The Duplicate

Whether you call it a clone, a bootable backup, or a carbon copy, a duplicate is a complete, exact copy of your entire hard disk that (if it’s stored on, or restored onto, a hard disk) you can use to start up your computer if necessary. Duplicates are wonderful because they enable you to get back up and running extremely quickly—in some cases, with only minutes of down time.

Consider this typical scenario: you’ve duplicated your Mac’s internal hard disk onto an external FireWire drive. One day you wake up and find that your computer won’t start at all; the screen displays a blinking question mark indicating that it can’t find a valid system. You suspect a catastrophic hard disk crash. No problem: you quickly hook up your backup drive and boot from that. Your computer will behave exactly as if it were running from the internal disk, with the exception that files added or changed since you performed the backup will be missing or out of date. You can then attempt to repair the internal disk—or if it’s completely dead, simply replace it.

You might think it would take a while to make a copy of your entire hard disk, and you’d be right. But most software capable of making a bootable duplicate can also duplicate incrementally —meaning that after the first time, updating your duplicate to reflect the current state of your hard disk requires only copying the files that are new or different.

Because duplicates are so powerful and useful, I recommend that you make them part of your backup strategy.

However…

Due to the proliferation and seeming simplicity of synchronization utilities, many people use duplicates as their only backup (see the sidebar, “Synchronization Utilities” ). This is a bad idea. Here’s why:

• Duplicates provide no insurance against damaged or accidentally deleted files. If your hard disk is missing files, or contains damaged files, when you perform the duplication, those problems will appear in the duplicate as well.

• Duplicates contain only the most recent version of each file. If you suddenly realize you accidentally deleted half of your dissertation or erased your contact database before your most recent duplicate, there’s no way to go back and retrieve an earlier saved version.

• Duplicates quickly go out of date. Duplicating an entire hard disk can take hours. Even while your backup is in progress, files are likely to change. So if your only backup is a duplicate, you may increase your risk that backed-up files will not be current.

For these reasons, although I heartily urge you to duplicate your hard disk on a regular basis, that is only part of a solid backup strategy. You should supplement the duplicates with archives (as I describe in the next section).

( Note: An extra hard drive is certainly the best way to make a duplicate, but you can also duplicate a volume onto a disk image, which can be stored on removable media such as CD-R or DVD-R—and then restored onto a hard drive when needed. By the way, it is possible, though not easy, to make a bootable Mac OS X CD or DVD. Because this process goes far beyond normal backups, I do not cover it here.)

The Archive

Sometimes referred to simply as a backup, an archive contains copies of your files as they appeared at multiple points in time. If you want to see the version of a file that existed on your computer two weeks ago, an archive can deliver that—along with today’s version and the version that existed a month ago.

An archive starts with a complete copy of all the files in one or more folders. The next time the backup runs, your backup software could make another complete copy, but because most of the files probably have not changed in the meantime, that would use up a great deal of space—not to mention taking a long time. So backup programs typically perform an incremental archive. This means that on subsequent runs, the software scans the files in the folders you’ve designated and copies only those files that are new (or newly modified) since the last backup. To be truly useful, archives should also be additive , meaning the backup program adds the new or changed files to the archive without overwriting the files already there. That way, you can retrieve many different versions of a given file, and if you delete it on your hard disk, you can still find it in your archive. Thus, what I refer to as an archive is technically an incremental additive archive .

( Note: Some backup programs use the term archive to describe files that have been copied to removable media of some kind for long-term storage and then deleted from the source volume.)

Archives sometimes make use of a snapshot —a list of all the files in the designated folders at the time a backup runs. Even though a certain file may not be copied (because it hasn’t changed since the last backup), it will appear in the snapshot list. You can easily see what the entire contents of a folder looked like at various arbitrary points in the past, and restore it to any previous state in a single operation.

After the initial full backup, archives usually take comparatively little time to run, making it easy to back up your data once (or even several times) each day. This ensures that your most recent backup is never more than a day old. Because they also offer tremendous insurance against accidental deletion (or change) and file damage, archives are an essential part of a good backup strategy. But archives alone are not an adequate solution. I say this for two main reasons:

• Because of the way archives are stored, they do not represent a complete, intact version of your entire hard disk. Ordinarily, an archive is not bootable (at least, not until after you’ve restored it to a fresh disk). If your main hard drive is completely dead, you won’t be able to do any work at all until you’ve replaced it.

• It often makes sense for an archive to include only data files—not your operating system or applications. But reinstalling Mac OS X and applications from their original CDs or DVDs is a lengthy and cumbersome process that you could avoid (or speed up dramatically) with a duplicate of your hard disk.

Archives protect you against inadvertent changes over time, but only a duplicate can get you up and running again quickly after a major problem. In other words, the best backup strategy includes both duplicates and archives.

That said, you can set up both duplicates and archives in many different ways, depending on the hardware and software you have, the types and sizes of files you typically work with, and other variables. In the second part of this article, I’ll make some general suggestions on strategies you can take.

[ Joe Kissell is the author of several books about Macintosh software, including Take Control of Spam with Apple Mail ( Tidbits Electronic Publishing, 2004) and curator of Interesting Thing of the Day. ]

Sidebar: Synchronization Utilities

Lots of utilities—including several that bill themselves as backup tools—perform a function called synchronization . As the name implies, synchronization means maintaining identical copies of a file, folder, or even an entire disk in two or more locations. Some synchronization utilities can run on a schedule, automatically “backing up” files from a location you specify to another volume. And some can create a bootable duplicate by synchronizing an entire disk to another disk.

There’s nothing wrong with synchronization—in fact, it can be incredibly useful in certain circumstances, such as keeping your PowerBook’s hard disk updated with documents you use frequently on your desktop Mac. As a quick and easy way of making an extra copy of certain files, it can serve as a type of primitive backup.

If you want to use a synchronization utility to make duplicates as part of your backup strategy, that is perfectly valid too. However, please do not mistake synchronization for a true backup—no matter what the utility’s advertising says.

What’s true of duplicates is equally true of individually synchronized files and folders: you get only the most recently modified version. You lack the ability to recover an older version of the file, which is a crucial part of a solid backup program. Also, if you don’t notice that a file is damaged before synchronizing it to another volume, you may end up with two useless copies. If you synchronize deletions, you lose your insurance against accidentally trashing files. And it’s all too easy to accidentally copy data in the wrong direction!

All that to say: a single copy of a single version of your data does not a backup make. By all means, synchronize if you wish, but not as a substitute for proper archives and complete, bootable duplicates.

Sidebar: Can a RAID Substitute for Duplicates?

RAID stands for Redundant Array of Independent (or Inexpensive) Disks; it’s a way of combining multiple physical hard drives into a single logical volume using either software or a special hardware controller. One way to configure a RAID, known as mirroring, is to have the same data written simultaneously to two or more drives. If any one drive fails, another can take over instantly and seamlessly with no loss of data and no down time; you can then replace the faulty drive at your leisure.

One developer of RAID software for Mac OS X uses the slogan “Better than Backup!” The logic is that the RAID gives you a bootable duplicate that’s always 100 percent up-to-date, without ever requiring you to run a backup program or worry about complicated restoration procedures in the event of a failure.

I have nothing against RAIDs, and if you need to keep a mission-critical computer running without any hiccups at all, a mirrored RAID might be just what you need. However, I strongly believe that a RAID is no substitute for multiple duplicates as described in this article. A mirrored RAID’s best feature is also its Achilles’ heel: because changes are reflected on all drives simultaneously, an accidentally deleted file will be immediately deleted on your “backup” drives too! Stand-alone duplicates—especially if you maintain two or three of them—reduce this risk greatly.

RAIDs address the problem of spontaneous drive failures, but they provide no insurance against human error, theft, natural disaster, or any of the other catastrophes that make backups so important. So, use a RAID if you wish, but only as a supplement to duplicates and archives.

Sidebar: Incremental or Differential?

Some backup programs distinguish between incremental and differential archiving schemes. Although not all software uses the terms in exactly the same way, the difference is typically that in an incremental backup, only the files changed or added since the last time the backup ran are added to the archive. With a differential backup, all the files changed or added since the initial full backup are added to the archive. Thus, differential backups take longer to run than incremental backups.

This distinction is important when backing up to tapes or other removable media, because it affects the speed with which a backup can be restored. When restoring from an incremental backup, the software must copy the entire initial backup and then step through each of the incremental backups to retrieve all the updated files. This can require a great deal of media swapping. A differential backup, on the other hand, can be restored more quickly because the software must copy only the original backup and the most recent one. When backing up to a hard drive, however, this distinction is less significant, because the random-access nature of a hard drive enables it to restore either sort of backup with roughly equal speed.

Subscribe to the Help Desk Newsletter

Comments