A reply from O2 regarding my complaint about iPhone 3G preorder fail
I received a reply from O2 to my recent complaint:
Hello Richard,
Thanks for your email about your upgrade order.
We didn’t receive your upgrade order. Your order wasn’t accepted, this is the reason you didn’t receive an email confirming the status of your order.
I’m sorry to hear you’re disappointed with the level of service you’ve received. We value our customers and we ve (sic) tried our best to provide you with the best possible service.
We’re working as fast as we can to deal with the high volumes of upgrade requests we received, but we cannot confirm for you at the moment whether your upgrade was successful. We recognise that this has not been a brilliant experience and apologise for the obvious frustration, but we are doing everything we can to resolve the situation as soon as possible.
Demand for iPhone 3G is staggering. We invested heavily in our website capacity which was tested carefully in advance, but we were experiencing 13,000 orders per second being placed, far beyond our expectations and our worst case scenario.
This may be of little comfort to you, but we were as prepared as we could possibly be but the sheer volume of demand is completely unprecedented.
We made a limited allocation of iPhone 3G stock available for pre-order online, primarily for those customers that pre-registered their interest. Demand has been very high and we have now sold out of this allocation.
To upgrade to the new iPhone 3G, please visit your nearest O2 store. Please be reassured that for new and eligible to upgrade O2 customers including iPhone existing customers, there will be iPhones available in store from 8:02am on the 11 July, although we again expect demand to be very high, so urge you to get down there early. All iPhone stock is being sold on a first come, first served basis.
You can also upgrade to the iPhone from Carphone Warehouse stores from 8:02am on 11 July 2008. Please note that you won’t be able to upgrade from an Apple store.
Now to be fair this is suitably apologetic but I take umbrage at two points.
Firstly, what this letter boils down to is “we messed up our website and that wasted a day of your life, so here’s an idea: go queue up outside one of our stores instead. Oh, they won’t have many though, so you’d better get there at ungodly o’clock. We’re really sorry. Please buy one.” Is it just me or is that quite insulting?
Secondly, that 13,000 transactions per second figure. Now, my posturing in my complaint letter wasn’t unfounded; I really have done scalability testing and analysis for some of the biggest travel ecommerce solutions in the UK. I will happily admit that 13,000 per second is a hell of a lot of traffic. Wow! 13,000 per second! I cannot imagine enough servers to cope with that; well, that gets O2 off the hook then. Quite understandable.
But wait a goddamned stinking minute. This doesn’t add up. In his letter to various customers, O2’s CEO Matthew Key said that
To put it in context we had over 200,000 people expressing interest and only a very small proportion of that number of devices available. Faced with this dilemma, we made it clear in the communications that to be fair to all customers the orders would be managed on a first come first served basis, as stock was limited. The response was so great that the online store completely sold out of iPhones within just a few hours.
Now, I’m nowt special, but I’m pretty sure 200,000/13,000 = 15.6. In other words, if the O2 website was processing 13,000 orders per second at its peak we would expect all 200,000 customers who asked for O2 to contact them about pre-orders to have ordered in a single 15 second period. Let’s be generous though; it was 13k peak and not 13k sustained, and it was “over 200k”. That still clearly implies that every single one of those pre-registered customers would have had to gone onto the site within something around a two minute window though. Furthermore, as there were only a “very small proportion” of those that could order before stock ran out, the stock should have been exhausted in, say, less than a couple of seconds. O2 have confirmed verbally to me and in emails to a few bloggers that stock lasted through until 11AM or so. So that doesn’t make any sense.
It also would require O2 to have simultaneously delivered the “hey, come buy me!” teasing SMS to all 200,000 people and for all of those people to be sat right at a computer and immediately gone “woo, here I come!”. In fact, I can confirm anecdotally from a small sample of friends and bloggers that those SMSs were received anything between half six and about half eight Monday morning. This 13,000 per second figure has been widely cited, by places such as Reuters and Daring Fireball, but seems to me to be scarcely credible.
Y’know what I think? I think its a cascaded failure of the system, a failure mode I violently feared when I was scalability analysing. Basically, it goes like this. In the first ten minutes a couple of hundred people try to order. In the second ten minutes a few more hundred new people arrive, plus half the first group whose first attempted order failed. Thirty minutes in and you have a thousand people banging on the system and now it’s really in trouble. Draw that curve out and you end up at 13,000 hits per second — entirely consisting of people on their tenth, twentieth, thirtieth attempt at ordering.
See, that’s a much less sexy headline. Suddenly we’re not “O2 took 13,000 orders per second and the servers melted to slag, but hey, no mortal could cope with that and Mighty Thor was on his day off”. Now we are “O2 took a hundred orders in the first five minutes and the system crapped out for most of them, then those people tried again and more people came and oh my it’s dying”. Now perhaps that doesn’t sound any different but the point is that this second scenario, which I contend is much more likely to be what happened, was both predictable and preventable. O2 knew how many SMSs and emails they were sending. They could easily figure the maximum possible rate customers could arrive at the site, and should have specced hardware to cope with this load, plus a bit of headroom. They didn’t do this, because if they did, there is no way on God’s green Earth they would have reached a load of 13,000 transactions per second.
I would dearly love for O2 to open its kimono and show some log analysis from the Apache servers. I’m willing to give them a fair crack of the whip, I’ve been around the block with this stuff and I know how subtle and tricky it can be. I’d subject it to rigorous analysis, not some made up superficial blogging crap. Bet they won’t do that though. Time and time again in my last job I dealt with clients with no idea about scalability analysis, who wanted to stick a finger in the air and leave a one hour meeting with all the figures somehow conjured up. They were always disappointed when I took weeks of running simulations and load tests before I would commit to any numbers — but none of the sites I looked after ever went down like this one did. I’m willing to bet O2 tried to wing it, jimmied some random number of servers into the cluster, and are now trying to wriggle out of the PR with some old-fashioned “look how big the numbers are!” pseudoscience and “hey, Apple didn’t give us enough!” blame-oh-rama.
Well, I call shenaigans on this. I’ll go further in fact. I bet sometime over the last few weeks, somewhere in O2, some bigwigs met some techies and some Bob Techie said “Here is the little microsite we made to take iPhone preorders.” Jim Sarky Technical Architect said “You fool! If that site crashes, what will happen to www.o2.co.uk?” That made Fred Bigwig nervous and he said, “Shit dude, yeah. Put the microsite on its own server cluster.” But of course there wasn’t a server cluster around for this, and Fred Bigwig wouldn’t sign one off, because who buys a cluster for just one day of taking orders? So, Bob Techie did what techies do, and he improvised with some spare servers, and maybe reallocated them from QA, and reused some old ones, and generally lashed a cluster together. And minutes after those SMSs went out the bailing twine and spit that was holding it together fell apart and now here we are, either writing (me) or reading (you) a whiny blog post about it made out of overlong sentences.
Oh, and finally, none of these shenanigans around scalability explain why the system was taking orders after stock ran out or why the frontline support staff had no visibility into the order system. I still think both of these things stink. From where I’m sitting it still looks like a world class balls up.






You can see my email, right? Email me Friday if you’re still sans JebusPhone and I’ll see if I can’t find you one. I have magic powers, see*.
*’dw seen jebusphone?’
Couldn’t agree more with you. It’s also amazing how new customers could get the iPhone but I have not seen a single upgrade user get a pre order!!
As a fellow gamer and self-confessed geek, I have to say I think this entry is brilliant, even though work for O2! It’s basically a reasonably structured argument which, in my job, is something I truly appreciate when compared with the usual ‘OMG OH TOO ARE TEH SUCKZORS! FAIL!’ reaction we’ve had to this.
I know that alot of people are feeling let down by this – I’ve seen the reaction first-hand – but I can honestly tell you that we prepared for this to the furthest extent we could anticipate. Quite simply, it wasn’t enough. We realise this and that’s why many have received an apologetic email. I’m also obliged to say that this is my own opinion and not a response on behalf of O2. I’ll finish with that old saying, ‘Hindsight is always 20/20.’
Sorry Richard, but your analysis is far too well-reasoned and thought out. You may never get that dream job at O2 after all
The amusing thing is, had you been at the O2 store at ungodly o’clock on 11th July, chances are you *still* wouldn’t have got an iPhone.
I was 12th in line at the Regent St Apple store, (you don’t want to know how early I arrived — I wasn’t even there to buy one!) and it was utter chaos when they opened the doors.
The activation process basically did not function at all for half an hour or so, and didn’t function in practice for a long time after that. We left the store around 9:30, sans iPhone.
The girl I was with went back in the evening to pick up her phone, but it didn’t actually get connected to the O2 network for over a week after that.
It’s nice to know that O2 haven’t improved in competence since they changed their name from BTCellnet.
Remind me sometime to tell you the saga of how they said they would disconnect my phone when I go mugged (but didn’t actually do it), and how long it took me to get (most of) the charges off the final bill.