The world is slowly getting to know what happened when RIM's Blackberry network and service was brought to its knees for three days last week. Although RIM has its European HQ offices and a NOC in Slough, Berkshire, the company's server farm, switches and other technical equipment is sited in Egham, Surrey, a few miles to the south, and that's where the trouble started. Martyn Warwick reports.
Stories are circulating in the technical media to the effect that a router at the heart of the core network failed (whisper it not in Gath but word is that it was Cisco kit that went on the blink) which chimes with RIM's belated and grudging admission that the massive outage was down to what it called a "core switch failure".
One assumes that RIM has multiple redundancies so it would seem that something went truly terribly wrong there. For the moment RIM remains silent on this but various techie geeks blogs say that a huge network upgrade process and program that was initiated in North America in response to a Blackberry service outage there back in late 2009 may well be the culprit.
This initiative involved what are being referred to as "fundamental changes" to the network and, interestingly and perhaps damningly, work in the UK on those self-same changes was completed only a couple of months ago, almost two years after it was installed in the US.
So, when the switch went down on Monday morning RIM's management decided to go back to the software that they used to use before the 'fundamental changes' were introduced. This could only be effected by taking down the IP backbone and then resetting and rebuilding it from the ground up. Result? Almost nothing in the network knew what it was, where it was or how to communicate with other parts of the same network.
That was bad and time consuming but, of itself, not sufficient to cause a three-day outage. But, whilst the rebuild and rest was underway, and unbeknownst to RIM's engineers, the huge Oracle database at the Egham site was heavily corrupted then cleaned and was then corrupted again.
Manufacturers of other networking solutions and product have been quick to capitalise on RIM's misfortunes and burnish their own credentials. For example, Ladislav Goc, the president of messaging company IceWarp, commented, "The [BlackBerry] outage reached monstrous proportions because of the RIM email synchronisation structure. When you activate a BlackBerry device, it sends your mail server information, username and password to the RIM global data centre.
Then the BlackBerry mail server uses the email connection information to retrieve messages from your corporate email, put them into the Blackberry servers and then push them back to your BlackBerry device. Such dependency is very dangerous. If the RIM servers are affected, millions of people are cut from their business-critical applications.”
This might be a great example of self-interest but it doesn't mean to say it isn't true or wide of the mark.
It seems the BlackBerry's popularity with email-addicted subscribers was the root cause of the system failure. A few years ago there were 5 million BlackBerry subscribers, this year there are 70.5 million. RIM's solution to ease the strain was to keep on adding servers and routers rather than biting the bullet and completely rewriting the core software of the network. The exercise was regarded as too time-consuming and dangerous so RIM took the easy way out, but in so doing simply stored-up troubles for itself that became very apparent further down the line.
BlackBerry services are now back up and the system is chomping through and delivering the many millions of messages that have been backed up for days on end. Some are still being delivered today, a week after the outage began.
As we know the cack-handed way RIM's management and PR department dealt with the failure has severely damaged the company's reputation and a quickly completed study by the shopping comparison site Kelkoo showed that 19 per cent of BlackBerry users are now actively considering dumping the devices and moving over to iPhones or an Android-powered handsets.
Furthermore, 42 percent of previously loyal BlackBerry aficionados say they will probably give up the device when their current contract expires while 8 per cent have already voted with their feet and wallets and have gone ahead and bought other devices.
Elsewhere, big corporations are trialing smartphones other than BlackBerry's - something that would have been unthinkable a couple of weeks ago. For example, in the UK, the Royal Bank of Scotland is issuing iPhones and iPads for senior staff "to appraise" whilst in the US both the Bank of America and Citicorp are doing likewise. The combination of the BlackBerry outage and newly beefed-up iPhone an iPad email security is the reason.
And, lest we forget, RIM's corporate HQ is in Waterloo, Ontario. An appropriately named place from which to mismanage a debacle.
please sign in to rate this article
48124