Road to Mac OS X Leopard: Time Machine
The Technology Behind Time Machine
How to actually do this is addressed by the lowest layer of technology in Time Machine. There is still a conundrum to address: we don't need copies of things that don't change; we only need to frequently save things that do. However, saving differential backups (that only capture new changes) means that we have to stitch together the results of lots of backups over a long period of time in order to restore an entire disk.
Managing changes over a long period of time often means we will end up with a mess of files spanning lots of disks, and there's no easy way to clean that mess up without starting over from scratch periodically and doing another new massive backup. Who wants to manage boring infrastructure details like that?
At the same time, there is also the problem of knowing what to back up. Typical backup systems laboriously scan the disk for changes, requiring the system to either wake up at night to do backup housecleaning, or interrupt the user while they work. As more users move to laptops, getting backups scheduled and performed frequently enough to be useful becomes an additional challenge.
Time Machine does a number of things to target these problems. First, it plugs directly into Mac OS X's FSEvents, the process that tracks file system events as they happen. This allows Time Machine to keep track of what needs to be backed up without having to do that work itself; the system already maintains those records, which are also shared among other applications (including Spotlight, which uses it to build instantaneous search results). When Time Machine begins a backup, it doesn't have to scrub the disk; it simply asks FSEvents for a list of what's changed and quickly hits just those files.
Time Machine does allow you to exclude specific items from being backed up, but it's not usually necessary to micromanage this list. That's because the system is already intelligently designed to avoid backing up temp files and other things that don't need to be backed up. Apple also gives third party developers outlines on how to avoid dumping unnecessary things into Time Machine's backups.
Time Machines makes setting up a backup target easy; users simply select a disk. The first time an external drive is plugged in, Time Machine offers to use it for backups. There's no complex configuration or management of backup media pools or any real device setup. It is possible to set up multiple disks for use with Time Machine, making it easy to create an offsite archive. It only backs up to one drive at a time however.
Apple recommends against using Time Machine on a disk used for other purposes. You can do this, but files copied to the time Machine disk are not backed up (because the system automatically excludes the Time Machine backup drive from backing itself up). Putting other files on the Time Machine drive also obviously eats into the space available for backups.
If you fill up your Time Machine drive, you can plug in a new drive and start over, leaving your old drive as an archive. You can also exclude folders of large content to avoid backing those items up, if you have lots of client data, photos, movies, or other items that are archived elsewhere and don't need to be included into Time Machine.
After finding a source, Time Machine sets up a full backup. It then schedules a backup every hour. There's really no scheduling options to configure. Every day, it drops the previous day's hourly backups. Every week it drops the previous week's daily backups. That maintains a complete, extensive set of backups that balance out the demands for backup frequency versus disk space.
Other backup systems commonly force the user to manage these details; Time Machine supplies the professional expertise so you don't have to think about it, and can't inadvertently set up schedules that make no sense. At the same time, it's easy to manually turn Time Machine off so that its regular backups don't interfere with game playing or other activities that demand an undistracted processor. Once you turn Time Machine back on, it simply jumps back to its schedule and resumes backing things up.
The Time Machine settings in System Preferences show the time scheduled for the next backup. When that time arrives, it displays a progress thermometer during the backup, which typically only takes a few seconds, unless you've generated a huge amount of new content in the last hour. Again, that's because Time Machine doesn't scan through your entire drive looking for changes, but rather only consults FSEvents for a listing of what has changed recently.
This backup frequency makes Time Machine immediately useful the first day you enable it. Delete a file or folder unintentionally, and you can nearly always immediately undo it, although it's still possible to create and destroy something within an hour and be left up the creek without a paddle. Time Machine's frequent backups are far more useful and practical than the "undelete" systems once popular among system utilities like Norton Utilities, which replaced the Trash with a system that tried to retain everything that was thrown away for later possible retrieval. Time Machine focuses on protecting the stuff you need and use, not on questioning the wisdom of everything you delete.
Backup systems typically make a full clone backup, and then copy only the differential or incremental changes. Differential backups capture everything that has changed since the last full backup, while incremental backups only copy what's changed since the last partial backup. Full backups obviously consume too much disk space to do every hour, but differential or incremental backups don't capture the whole picture in a single shot. Time Machine appears to do both: capture full backups every hour without taking up all the disk space this would require. How does it do this?
An intelligent backup system using differential backups would also have to parse all the various backups done in order to present a composite view of all the partial backups to present the user with the files that can be restored at any given time. The user might want the version of a file from two hours ago, or from two weeks ago. Accommodating this kind of flexibility typically requires managing a complex database of backup file transactions. If that metadata database is lost, restoring files from the backups becomes far more complex, and requires an arduous and lengthly rebuilding of the database.
To solve both problems, Time Machine does something new and different that actually required Apple to make changes to the underlying Mac file system, HFS+. The new change is referred to multi-links, which are similar to "hard links" common to Unix users and potentially available when using NTFS on Windows. Hard links differ from "soft links" (also known as symbolic links), which simply act as placeholders pointing to another file. The Mac OS has long used aliases as a way to create a soft link stand-in for another file or directory. Windows calls soft links "shortcuts."
On page 3: Hard and Soft Links; Time Machine's Multi-Links; and Snapshots and Windows' Shadow Copy.