While occasional service outages have hit nearly everyone in the business, knocking Google's Gmail offline for hours, plunging RIM's BlackBerrys into the dark, or leaving Apple's MobileMe web apps unreachable to waves of users, Microsoft's high profile outage has impacted users in the worst possible way: the company has unrecoverable lost nearly all of its users' data, and now has no alternative backup plan for recovering any of it a week later.
The outage and data loss affects all Sidekick customers of the Danger group Microsoft purchased in early 2008. Danger maintained a significant online services business for T-Mobile's Sidekick users. All of T-Mobile's Sidekick phone users rely on Danger's online service to supply applications such as contacts, calendars, IM and SMS, media player, and other features of the device, and to store the data associated with those applications.
When Microsoft's Danger servers began to fall offline last Friday October 2, users across the country couldn't even use the services; even after functionality was beginning to be brought back on Tuesday October 6, users still didn't have their data back. This Saturday, after a week of efforts to solve the crisis, T-Mobile finally announced to its Sidekick subscribers:
âRegrettably, based on Microsoft/Dangerâs latest recovery assessment of their systems, we must now inform you that personal information stored on your device â such as contacts, calendar entries, to-do lists or photos â that is no longer on your Sidekick almost certainly has been lost as a result of a server failure at Microsoft/Danger.â
A new report from Engadget says that T-Mobile has suspended sales of its Sidekick models and is warning: "Sidekick customers, during this service disruption, please DO NOT remove your battery, reset your Sidekick, or allow it to lose power."
Sidekick and the iPhone
Danger's Sidekick platform bears some resemblance to the iPhone; Danger brought the GSM Sidekick to market by partnering exclusively with T-Mobile. The partnership involves custom network services that makes features of the device unusable on other networks, and of course the phones are physically incompatible with the CDMA service operated by Verizon/Sprint. In some ways, Microsoft's purchase of Danger is exactly the fix recommended by some pundits for Apple's iPhone: a third party who could swoop in and break the iPhone's exclusive partnership with AT&T by bringing Verizon into the mix. In Danger's case, the "Pink Project" operated by Microsoft not only failed to achieve this intended goal, but failed in large part because the goal was simply a bad idea.
After all, if exclusivity was inherently a bad thing, it wouldn't be being used to successfully bring competing new models to the crowded smartphone market; Danger's Sidekick, Apple's iPhone, RIM's BlackBerry Storm and Palm's Pre all gained their visibility in the market because of concerted marketing by their exclusive mobile partners. All have experienced some launch issues which would have been far worse and more complex to resolve had their hardware makers tried to simultaneously launch them on multiple carriers, as Microsoft planned to do with its Pink "Windows Phone," using components borrowed the Danger acquisition.
However, the Danger Sidekick also has some significant differences from Apple's iPhone business model. First, the iPhone is designed to plug into a computer running iTunes for initial setup, and while not entirely mandatory, it is designed to regularly sync with a desktop system. This process involves backing up all of the device's application data to the users' local computer, and allows the user to restore the device later. Apps running on the iPhone also run as local software and do not require an external service to be available. Most applications are designed to work offline, as a significant chunk of the iPhone platform is made of up of iPod touch users. Apps are only updated and/or removed by the user.
The closest thing to Danger's online services is Apple's MobileMe, which is sold separately as an optional package of services that can sync, update, and push messages, contacts, calendars, bookmarks and other data to the phone, to associated desktop computers and for presentation via the web. After a problematic rollout plagued by slow performance and frequent outages occurred last year, Apple's MobileMe has matured into a reliable service. Even so, an interruption in MobileMe services wouldn't result in users being unable to use apps on their iPhones nor would it risk the loss of data on the device or backed up by the user's copy of iTunes.
The dark side of clouds
More immediate types of cloud services take away users' control in managing their own data. In addition to the Danger services for Sidekick users, Microsoft also independently runs a MyPhone service for its Windows Mobile platform. It provides certain mobile and web publishing features (but not push messaging) comparable to Apple's MobileMe.
However, Microsoft's MyPhone performs its backups of users' phone data directly to the company's servers, and not to the user's local system. That means a Danger-like failure on the server end of MyPhone could easily result in unrecoverable data loss for Windows Mobile users, too.
Users have reason to be wary about keeping all their data on a vendor's cloud service without also maintaining their own local backup. If Apple's MobileMe service loses your data, the company won't do much to help you restore it because it also provides a variety of ways for users to backup and restore their own data locally, directly from apps such as Address Book and iCal, by using a local backup system like Time Machine, and in using iTunes to backup mobile devices at sync. Apple's MobileMe cloud services are run as an accompanying value added service in addition to the maintenance tools users are given to secure their own data.
Other vendors have very different ideas about accountability for data in the cloud. In 2006, a relatively small number of Google's Gmail users experienced a security-related loss of their email and contacts. At the time, Google could only offer to reach out to the people who were affected "to apologize and to work with them to restore the email from any personal backup they might have." Google's strategy moving forward is highly dependent upon "non-local" cloud computing, with the company's Gmail being joined by its online Docs, Picassa, and Gtalk clients (which store all their data on Google's servers) as well as future plans to deliver Chrome OS as a web-client substitute to the conventional local operating system. That will largely replace the entire idea of local apps under the user's own control with online apps that the vendor can change, update, or drop at any time.
Palm's new WebOS in the Palm Pre similarly banks on the cloud to provide web-based apps that are updated and replaced by the network operator, not by the local user as is the case with the iPhone. Amazon's Kindle also demonstrated the potential for the network to take control of users' data after the company revoked certain books from users' devices, a policy it has since apologized over and paid to settle. Delegating all control to the cloud sounds great until there's a problem that the cloud vendor has no interest or capacity to resolve for the user. It then quickly becomes a frustrating nightmare.
Is Danger in Microsoft?
Some users commenting on the week-long outage and its resulting data loss crisis at Danger were quick to absolve Microsoft of any responsibility in the incident, suggesting that Microsoft only bought the company last year and that it did not originally design the service. While Danger has run its services for years prior to the acquisition and has previously experienced outages, it hasn't lost all of its users' data across the board before. The frustration and dashed hopes voiced by long term Danger partner T-Mobile in its apology to Sidekick users was clearly worded to highlight Microsoft's involvement in the incident.
Microsoft's takeover of Danger almost two years ago should have given the software giant the time to fortify and secure Danger's online operations. Instead, it appears the company actually removed support to cut costs. According to a source familiar with Danger before and after the Microsoft acquisition, T-Mobile's close partnership with the original Danger was leveraged and then betrayed by Microsoft when Steve Ballmer's company decided there would be more money involved in dropping its exclusive deal with T-Mobile to partner with Verizon on the side.
Microsoft's accountability in supporting its acquired Sidekick support obligations with T-Mobile was also shirked. The source stated that "apparently Microsoft has been lying to them [T-Mobile] this whole time about the amount of resources that they've been putting behind Sidekick development and support [at Danger] (in reality, it was cut down to a handful of people in Palo Alto managing some contractors in Romania, Ukraine, etc.). The reason for the deceit wasn't purely to cover up the development of Pink but also because Microsoft could get more money from T-Mobile for their support contract if T-Mobile thought that there were still hundreds of engineers working on the Sidekick platform. As we saw from their recent embarrassment with Sidekick data outages, that has clearly not been the case for some time."
That indicates that Danger's high profile cloud services failure didn't occur in spite of Microsoft's ownership, but rather because of it. This has led observers to question the company's commitment to its other cloud services, not just Windows Mobile MyPhone, but also the Azure Services Platform of cloud computing efforts that the company has had on the drawing board for years. Azure is designed to allow third parties to build applications that are dependent upon Microsoft's data centers.
In covering the Danger debacle at Microsoft, Ina Fried of CNET wrote, "while outages in the cloud computing world are common (one need only look at recent issues with Twitter or Gmail), data losses are another story. And this one stands as one of the more stunning ones in recent memory."
Fried added, "The Danger outage comes just a month before Microsoft is expected to launch its operating system in the cloud— Windows Azure. That announcement is expected at November's Professional Developer Conference. One of the characteristics of Azure is that programs written for it can be run only via Microsoft's data centers and not on a company's own servers. It should be pointed out that the Azure setup is entirely different from what Danger uses: the Sidekick uses an architecture Microsoft inherited rather than built (Microsoft bought Danger last year). Still, the failure would seem to be enough to give any CIO pause."
Daniel Eran Dilger is the author of "Snow Leopard Server (Developer Reference)," a new book from Wiley available now for pre-order at a special price from Amazon.
103 Comments
What ever happened to redundancy?
There has to be more to this story. What cloud service doesn't do at least nightly backups?
I doubt this is going to slow down cloud computing. Data loss can happen anywhere, hardware failure can happen anywhere. Regardless of what platform you use, you need a backup plan for your data and services and have proceedures in place for the worst case scenario.
What ever happened to redundancy?
At the rate things are going, it would seem that Microsoft will get there soon ......
When Apple lost about 1% of email data while moving from .mac to MobileMe back then, I considered that unacceptable and a major sign of incompetence in that area (cloud-based services).
Of course, MS dwarfs them here. This is not the usual Redmond photocopy, this is a serious enlargement. A 100% proof of incompetence, negligence and lack of organization. Any part-time admin with such a back-up strategy would be held personally liable.
Still, it underlines one major point: cloud based services without 100% and up-to-date local back-ups (client-side) are not ready for prime time anytime soon. If even the richest companies having full control about each and every software and device involved can't manage it, then nobody can.
The latest I've read is that this came from a failed SAN migration of some variety. Waiting for confirmation of that, if true it's almost inexcusable (even if it is inherited technology).
Still, when you outsource key functionality like running your database architecture, you can't be absolutely certain that proper precautions are in place the way you can when you control the process. If you can't control a copy of your data, you literally have no control of your business. It's that simple.
Can't wait for the big Azure announcements by Ballmer!
John_B (in-sourced database guru)