Friday, October 12, 2007, 10:00 am
Road to Mac OS X Leopard: Time Machine
Hard and Soft Links
Soft links are easy to understand and simple for users to employ; make an alias of a file, and it does everything that its target would do, while saved in another location. Multi-link files are a more complex idea, because they don't really fit the overall desktop metaphor. A Mac multi-link is a second "hard link" record that points to data or a directory. It doesn't just point to another file like an alias does; it is the same instance of that file. Create a hard link to a file, make changes to it, and the "previous file" is changed as well, because they are the same file. Delete it, and the file doesn't go away; it remains until the last hard link is removed. This is confusing in a quantum physics sort of way because it doesn't line up with the convenient physical metaphors we commonly use to visualize files and folders.
Regular files on any file system act as a single hard link. When you delete a file, you aren't really scrubbing the file off the disk, but rather only removing the hard link to it from its enclosing folder, banishing it to the unruly world of the unlinked wilderness of the drive. "Undelete" utilities attempt to search for unlinked files and restore them, but they can only work if the file system hasn't overwritten those unlinked files, which it will happily do without any concern, because the disk space consumed by unlinked files is fair game for recycling.
File systems that support multiple hard links to the same file keep a count of each hard link created. Each hard link to a data file acts like a parallel, shared instance of that file; if you delete a file with multiple hard links, it is not banished into the world of deletion. Instead, it remains in place until the very last hard link pointing to it is removed. In other words, if you create a hard link to a file and then delete the original, the new hard link and its file data remains unscathed. In contrast, if you make a soft link or alias of file and then delete the original, the alias only points to the former location of the deleted file, which no longer exists. The lone remaining alias can't function, and the data of the file it pointed to is no longer available.
Time Machine's Multi-Links
Hard links therefore act somewhat like a ghost, and somewhat like a clone. Like an alias, new hard links can reference an existing file without taking up additional space on disk. However, deleting one is like cutting off the head of a mythical hydra; it doesn't rid the world of its body and new hard links could pop up in its place, making it difficult to delete the beast entirely unless you can hunt down every hard link head and chop them all off. This complex idea doesn't fit well into the user space, but is very useful for certain purposes. One of them relates to Time Machine.
Apple actually designed the multi-links in HFS+ primarily to support Time Machine. Unlike other Unix or Linux distros, Mac OS X's multi-links support hard linking to both files and directories. Creating multiple hard links to directories is outlined in the official POSIX specification for Unix, but is rarely supported because the use of multiple hard links for directories is dangerously powerful. If a child directory linked to its own parent, it would create a directory cycle that could cause unbridled looping and file system corruption. File system utilities are also typically unprepared to handle multi-linked files. In Time Machine, multi-links are used in a specific, controlled context to avoid these types of problems.
Time Machine backs up its first full backup as regular files. You can mount the drive it uses and peruse it manually. It's simply all of your files stuffed in a time-stamped folder, stuffed in a folder named after your computer. Every hour, Time Machine makes what appears to be another full backup, but that new bunch of files doesn't take up twice the space. Instead, Time Machine dips into the abstract world of multi-links to create a parallel universe where most everything is a ghostly clone. Only the changed files are new; everything else is a secondary hard link to the data already backed up. The genius of multi-links means that the original files can be blown away, and yet still exist because new hard links are in place for those files.
By using hard links, Time Machine can keep frequent backups without eating up much disk space at all. The other benefit is that the user can browse the files directly and see a full file system for each date and time encapsulated in a backup. Each folder appears to be a full backup of regular files, and in a sense it is. All of those complete backup files simply share the same space on disk, as if living in parallel universes while sharing the same body. Hard links are needlessly confusing and potentially dangerous to non-technical users, so Apple doesn't expose the inner workings of Time Machine and doesn't offer any way to create hard links in the Finder.
Snapshots and Windows' Shadow Copy
Time Machine has been frequently compared to Microsoft's Shadow Copy (or Volume Snapshot Service), because both systems involve file backup. In reality, they are not really very similar at all. Microsoft uses the background Shadow Copy service to duplicate files on the same disk. Those shadow copies record a "snapshot" of the file at a given moment in time, and can be accessed by the user using Previous Versions (which shows up in the file properties viewer), or tapped into by an external network backup system. Backing up these "shadow copies" simply prevents the external backup system from running into problems trying to back up live files that may be locked by the user working on them.
The data backup features related to Shadow Copy are only useful if a Windows machine is running in an environment with a server backing them up. Shadow Copy is not in itself a backup system, although it can present a listing of duplicated files that were captured by the shadow copy service. Without a dedicated backup system, Previous Versions only shows local shadows of a file. It does not copy files to an external disk for safekeeping, and its shadow copies can't be browsed through by the user in the file system by date or by query. Shadow Copy is certainly not an easy to use consumer backup solution (nor is intended to be), which is what Time Machine expressly is.
In Windows Vista, Microsoft also tied Shadow Copy into System Restore, which allows users to roll back their entire PC software install to a previous point in time. This is not a backup system either; it's a system wide undo. System Restore is oriented around undoing the problems caused by installing a software title, a Windows software update, an unsigned hardware driver, or some other event that causes problems that need to be rolled back. It doesn't go back and find something lost from the past; it reverts the clock to a previous checkpoint and throws away the future from that point forward. System Restore is not even loosely related to Time Machine in what it does, how it does it, or why it exists.
On page 4: The Pretty Layer of Time Machine and Back Up to the Future.
On Topic: General
- Steve Jobs discusses his legacy in rare 1994 video interview
- Solar charging stations with Apple Lightning & 30-pin connectors come to New York
- Briefly: Jony Ive's title at Apple shortened to 'SVP of Design'
- DOJ e-books trial: Apple's Cue explains 'agency' contracts and pricing, denies culpability
- Skype Video Messages come to Apple's iOS, OS X