Free AppleCare with each new iMac 5K or Mac Pro purchase: Apple Price Guides updated Nov 22nd (exclusive coupons)
Breaking Tuesday, September 22, 2009, 02:00 pm PT (05:00 pm ET)

Inside Snow Leopard's UTI: Apple fixes the Creator Code

Data typing beyond files

Type metadata isn't just useful for documents. It's also used to identify data that isn't saved in a file, such as clipboard data. When you copy and paste, the Mac OS has to keep track of the type of data you copy. An operation might start with a copy selection from a Word file with special formatting, but you might want to paste that data somewhere that only recognizes RTF (like TextEdit) or plain text (such as a search field or the name field in a Save As dialog).

The application involved has to supply as many different representations as possible to allow for different paste destinations. For example, Word may provide the pasteboard with Word-formatted text, RTF and simple plain text. The document selection copied to the clipboard might also contain graphics and text, or even embedded video. For copy and paste to work intuitively for users, the system has to accommodate pasting rich data into places where only basic data is supported; it has to know how to degrade gracefully. This requires a sophisticated data typing mechanism for identifying the different representations of copied data.

Starting in Mac OS X Panther, Apple began labeling pasteboard data (the internal clipboard used to store information between copy and paste operations) using a flexible, sophisticated tagging system that identified data in both general and specific terms. The system maintained records of how the tags fit into a hierarchy of increasingly more specific data types.

Mac Pasteboard and UTI

Introducing Uniform Type Identifiers

In Tiger, Apple began introducing these UTI tags as a way to identify file types as well. As with the pasteboard, a file might be tagged with the UTI of a proprietary document type, but that UTI can identify itself to the system as also conforming more generally to be rich text, as well as plain text, or most simply as a file. This UTI hierarchy of type identification allows the system to work with the data in both general and specific ways.

UTI's added layer of sophistication in data typing is similar in some respects to Leopard's introduction of NT-style ACLs (Access Control Lists) for use in file permissions, on top of the older Unix-style permissions. Rather than three buckets of read and write file access rights (user, group, other), ACLs allow a file to be tagged with any number of individual and very specific per-user permissions.

Similarly, with UTI, rather than two buckets of file type identification data (Type and Creator, or MIME's "type/subtype"), there can be a wide range of increasingly specific type information connected to a file or copy selection in order to richly express how it can be used.

The UTI model

What does this extra magical layer of sophistication provide? First off, it harmonizes the pasteboard data types with file types. It's also used in drag and drop, which is essentially a one-step copy and paste operation. It's also used in Application Services, which is a fancy form of copy and paste where the data is transformed in between being copied and pasted, usually ending up being modified in place.

With UTI established as a uniform method of identifying a very specific data type of a file or a selection of data, the operating system can maintain a single model of how a given bit of data can be used by other apps. This also allows applications to use data tagged with a UTI they don't recognize, but which the system knows is compatible with a type that application does know how to use.

Mac OS X can also translate existing Type/Creator Codes, MIME types, and file name extensions into its unified UTI model, bridging legacy into its new world.

On page 3 of 3: UTI definitions, The features of UTI.