Inside Snow Leopard's UTI: Apple fixes the Creator Code
What is a Creator Code?
Back in the early 80s, Apple developed a variety of unique conventions to make the Macintosh intuitively easy to use, almost to a magical extent. One example pertains to invisible file metadata that helped the system identify documents using Type and Creator Codes.
Each file was tagged with both a Type and a Creator, allowing the system to distinguish between applications that could open a file and the default application that should open the file. The Finder could also use this invisibly associated file metadata to present custom file icons for different files of the same type but created by different applications, such as two JPEGs, one saved by Graphic Converter and the other by Photoshop.
The Finder maintained simple associations between applications and files, so that any document created in Word would open automatically with Word, and each of several different text documents would each open with the application that was used to create it. In the days when users had a few simple, monolithic applications, this Type/Creator convention made sense.
On DOS and Windows, the operating system could only determine how to open a file based on its file type extension, which was explicitly part of the the file name: myfile.doc. DOS and Windows were helpless to offer the user any assistance in knowing what application originally created a given file, unless the file were saved to a proprietary format that only one app could possibly open. In that sense, Mac Creator Codes encouraged the use of common, interoperable file types while still associating certain files with a specific app.
Leaving the Mac island
The Creator Code convention worked well until documents began moving to and from other platforms, such as being saved to a DOS disk or a Unix-based file server, or via email or the web. DOS and Unix ignored the Mac metadata, so they didn't know how to open the files received from Mac users, as those files typically lacked the simple file extension that those other systems required to interpret the file's type.
Additionally, since those platforms also scrubbed away the Mac metadata, files copied to and from a non-Mac savvy platform couldn't be correctly identified or opened on the Mac anymore either. With Apple's magic removed, the files just sat there with blank generic icons, and the Finder acted just as helpless with them as Windows. The problem for Mac users was eventually solved by packaging the Creator Code information to withstand transport on a non-HFS formatted disk. That didn't help Mac users who needed to send their files to users on other platforms however.
The common solution to that issue came to be to assign Mac files redundant information: both Type and Creator codes as well as a file extension. On NeXT and later Windows, files were given file extensions only, as there was no underlying file system support for Mac-style metadata for Unix or DOS. NeXT's impact on Apple was particularly evident in the shift away from Creator Codes and toward file extensions.
Old time Mac users often looked at file extensions as an ugly and unnecessary feature required only to allow Macs to interoperate with the filthy masses. So, when Mac OS X arrived with support for the old Mac metadata but with new guidelines from Apple mandating that developers also use NeXT-style file extensions to express type, there was consternation that the Mac was losing the things that made it special.
Complaining about the end of the road for Creator Codes fit in with unrest about other changes afoot at Apple, such as the loss of the Spacial Finder (the idea that you shouldn't be able to view a folder in the Finder in multiple views at once; that files should be locked into the view you last saved and stay there until you modify it).
Why Creator Codes Died
However, Apple didn't schedule a deprecation of Creator Codes just for fun, or just to be lazy, or just to fit in with lowest common denominator operating systems. Type and Creator Codes simply weren't good enough to support new OS features Apple had on the drawing board. Rather than trying to get by with doing less, Apple was set on doing more, and Creator Codes weren't up to the task.
There are lots of places where the operating system needs to handle advanced data type management which transcends the basic file Type and Creator Code system developed for identifying documents on the original Macintosh nearly 30 years ago. The system must also do this in a way which remains compatible with the file type extensions in use on other platforms.
Type and Creator Codes each used four-character labels, limiting the potential for expansion. That was an improvement upon the DOS idea of 8.3 names, which truncated documents, executables, and JPEGs to cryptic, three letter .doc, .exe, and .jpg file extensions. However, creating a new Type or Creator Code required registering with Apple, because there was a finite number of expressive codes available between 0000 and ZZZZ.
Today, the app used to create a file is not necessarily the one that most users want to open it with; the Creator Code system didn't offer any simple way for users to modify the creator after the fact, either individually, collectively for a selection of files, or across all files of a given file type. The complexity of what users expect to do (and what kinds of data they are working with) in 2009 is vastly different than it was back in the early 80s when Apple conceptualized the rather simple Type and Creator Codes.
To support new kinds of features, Apple invented a flexible new system for expressing file and creator information in a system called Uniform Type Identifiers. The idea of rich file typing wasn't entirely unique; the BeOS began using MIME types as its mechanism for expanding upon the concept of File Type and Creator Codes during the 90s. However, MIME had its own issues.
MIME types are used by web servers and in email to express the type of file being served or attached. For example, a web server might identify a graphic as "image/jpeg" in addition to the file's .jpg or .jpeg file type extension. MIME also allows developers to make up their own types, either by registering a specific name with the IANA (the root DNS naming authority) or making up ad hoc names, an improvement upon Apple's registered Type and Creator Codes.
The BeOS copied a page from the web browser, which allowed users to define which helper app should be used to open a file of a given MIME type. If it encountered a specialized MIME type that it didn't recognize (such as "text/xml"), it could try opening it with the program designated to open plain text files. This "type/subtype" system offered a two bucket approach to defining file type, allowing specialized files to be opened by more general apps if necessary.
However, MIME was originally created just as simple way to push international characters and binary attachments through the Internet's SMTP email system (which only understands plain ASCII text). MIME only offers simplistic, rudimentary support for identifying the kind of data being transmitted. Apple wanted a much more rich and robust system for detailing more than just the basic document type of some binary data.
On page 2 of 3: Data typing beyond files, Introducing UTI.
Type metadata isn't just useful for documents. It's also used to identify data that isn't saved in a file, such as clipboard data. When you copy and paste, the Mac OS has to keep track of the type of data you copy. An operation might start with a copy selection from a Word file with special formatting, but you might want to paste that data somewhere that only recognizes RTF (like TextEdit) or plain text (such as a search field or the name field in a Save As dialog).
The application involved has to supply as many different representations as possible to allow for different paste destinations. For example, Word may provide the pasteboard with Word-formatted text, RTF and simple plain text. The document selection copied to the clipboard might also contain graphics and text, or even embedded video. For copy and paste to work intuitively for users, the system has to accommodate pasting rich data into places where only basic data is supported; it has to know how to degrade gracefully. This requires a sophisticated data typing mechanism for identifying the different representations of copied data.
Starting in Mac OS X Panther, Apple began labeling pasteboard data (the internal clipboard used to store information between copy and paste operations) using a flexible, sophisticated tagging system that identified data in both general and specific terms. The system maintained records of how the tags fit into a hierarchy of increasingly more specific data types.
Introducing Uniform Type Identifiers
In Tiger, Apple began introducing these UTI tags as a way to identify file types as well. As with the pasteboard, a file might be tagged with the UTI of a proprietary document type, but that UTI can identify itself to the system as also conforming more generally to be rich text, as well as plain text, or most simply as a file. This UTI hierarchy of type identification allows the system to work with the data in both general and specific ways.
UTI's added layer of sophistication in data typing is similar in some respects to Leopard's introduction of NT-style ACLs (Access Control Lists) for use in file permissions, on top of the older Unix-style permissions. Rather than three buckets of read and write file access rights (user, group, other), ACLs allow a file to be tagged with any number of individual and very specific per-user permissions.
Similarly, with UTI, rather than two buckets of file type identification data (Type and Creator, or MIME's "type/subtype"), there can be a wide range of increasingly specific type information connected to a file or copy selection in order to richly express how it can be used.
The UTI model
What does this extra magical layer of sophistication provide? First off, it harmonizes the pasteboard data types with file types. It's also used in drag and drop, which is essentially a one-step copy and paste operation. It's also used in Application Services, which is a fancy form of copy and paste where the data is transformed in between being copied and pasted, usually ending up being modified in place.
With UTI established as a uniform method of identifying a very specific data type of a file or a selection of data, the operating system can maintain a single model of how a given bit of data can be used by other apps. This also allows applications to use data tagged with a UTI they don't recognize, but which the system knows is compatible with a type that application does know how to use.
Mac OS X can also translate existing Type/Creator Codes, MIME types, and file name extensions into its unified UTI model, bridging legacy into its new world.
On page 3 of 3: UTI definitions, The features of UTI.
For example, the "public.html" UTI defines itself as conforming to "public.text," allowing any apps that know how to use text to also access HTML formatted text, even if they never anticipated editing HTML. They could also work with a new vendor-specific UTI such as "com.editmax.htmlplus" that defined itself as conforming to "public.html."
In turn, "public.text" conforms to both "public.data" and "public.content," so any process that knows how to work with files or content in general terms can also work specifically with HTML files, or any new specialized forms of HTML files invented in the future.
UTI definitions also include a human readable (and localizable) description of what that file type is; identify the icon that should be used to represent that file; and outline alternative ways to express that file type. For example, the "public.html" UTI associates itself with the legacy "HTML" Type Code and a variety of file extensions that HTML files may use, such as .html, .htm, .shtml, and .shtm. It also associates itself with the MIME type of "text/html," making UTI the uber-type of data typing.
Developers can make up their own new UTIs, defining them in relation to more general, existing UTIs, without needing to formally register anything. Vendor-specific UTIs use 'reverse DNS' naming just like preference files, preventing different companies from inadvertently using the same label. In the world of file extensions, there's nothing unique about "file.doc," as both WordPerfect and Word used the .doc extension. With UTI, Microsoft uses "com.microsoft.word.doc" to express a unique data type for its files.
As a side note, "reverse DNS" is used because DNS itself is backward. Web URLs like "www.apple.com/mac/features.html" go from most specific to least specific (in server name hierarchy), then hit a slash and begin getting more specific (in web server file hierarchy). This is like writing the number 1,234.567 as 4321.567: nutty. So "reverse DNS" is an attempt to fix a big mistake made in web URLs everywhere else that a similar naming hierarchy might be useful. It's too late to fix the web.
Features of UTI
So, rather than just defining a file in terms of the app that created it and its general type, developers can now assign their files a UTI that identifies a very specific creator and type, and then inform the system of how that specific new UTI label relates with other more general file types that other apps (and system features) already know how to interact with. Developers can also still indicate that their unique UTI-tagged files should open with their "creator" app, a feature the end user can override via a Finder preference.
In addition to using UTI to manage what app to use when opening a file in the Finder (via Launch Services) and in determining how to perform copy and paste, drag and drop, and Application Services (via Pasteboard Manager), Mac OS X also uses UTIs within Spotlight importers, Automator Actions, and Quick Look previews, as well as in Navigation Services to narrow down the relevant files presented in open and save dialogs.
For Launch Services, Apple tells developers to define UTIs for the documents their applications create, and include these within their application. A vendor-specific UTI works like an old fashioned Type and Creator, although developers don't have to register their definitions with Apple in advance. So rather than Adobe registering the "8BIM" creator and "8BPS" type for its Photoshop files, the UTI "com.adobe.photoshop-image" expressively and uniquely identifies these files as entities that open with Photoshop.
Other applications can also open Photoshop documents. Conceptually, other apps could even create versions of Photoshop documents that open by default using their own app simply by adding a new UTI that defines itself as a specialized version of the existing "com.adobe.photoshop-image" UTI. This allows the system to do everything it already knows how to do with Photoshop files (copy and paste, index for search, use in Automator Actions, Quick Look) to new documents defined by the specialized "Photoshop+" UTI. Type/Creator Codes can't do that.
The Pasteboard's use of UTI was introduced above. It shouldn't be surprising that iPhone 3.0 also uses UTI in its new copy and paste support. If you were wondering why Apple "took so long" to bring that feature to the iPhone, it's because the platform opted for a long term, robust solution rather than a quick and dirty kluge impacted by security issues.
For Spotlight searching, Apple instructs developers to provide UTI definitions for the types of files their importer plugin can index. The system then uses the most specific and specialized importer available to index files matching that UTI.
For Automator Actions, developers specify the UTIs their Action knows how to input data from and what UTIs the Action will spit out. Automator Actions are essentially a modular string of Services that are compiled together to copy data from one file or selection, transform it, and spit out a result.
For Quick Look, plugins specify UTIs for the files they can view. Again, this allows the system to select the most specialized plugin to use in displaying a given file type. In other words, a developer could create a specific plugin for viewing a fancy text UTI (such as XML) with more features (such as color-coded markup tags) than Apple's own basic text Quick Look plugin offers. Quick Look is also an application of UTIs that allows the system itself to open a file for quick viewing without affecting the user's desired default launching app.
The fix for Creator Code junkies
UTIs provide a standardized way for defining data anywhere in the system, one that transcends the Type/Creator Code's limited concept of a single app that creates given file types. Developers who complain that Snow Leopard doesn't support Creator Codes need to brush up on UTI, which has been in place for several years now. It offers all the benefits of Creator Codes while enabling lots of other modern new features.
Users who miss being able to automatically open a file using the app that originally created it can pester their app's developer to get on the ball with UTI. Any application that has been updated since 2005's Tiger, but which does not yet support UTI, has opted not to support an important feature of the Mac platform.
Everyone else, including many of us who didn't ever understand why the system launched files using a specific app rather than the one we had defined for that given file type, can continue using the Finder's Open With menu, drag and drop app launching, or set a permanent per-item default "creator" app for opening a selection of documents by using the Get Info panel.
Daniel Eran Dilger is the author of "Snow Leopard Server (Developer Reference)," a new book from Wiley available now for pre-order at a special price from Amazon.