Burn me once — how to handle a crashing website

In the fall of 1999 Forbes.com disappeared for four days. It went dark, kaput, a victim of too much PR and traffic pointed at it by AOL and the annual list of the 400 Richest Americans, a server platform built on the cheap with the wrong technology, and a tech team completely frozen in the headlights.

I lived those four sleepless days in agony — we were pre-IPO, talking to bankers, getting a ton of scrutiny with a new CEO aboard — and boom, goodbye web site. Was it a memory leak (I love how everything eventually gets blamed on a memory leak!)? Was it an issue with concurrent connections? The ad server? I made the tough call on the fourth day to literally start over and relaunch, building back content and functions as time went by.

Danzz Dance -- Flickr

Those were dark times for me, for our ISP, and for our tech team. I learned a lot about technology disaster recovery and crisis management and today it came in handy.

Another site failing (not Lenovo.com), going down as “Service Unavailable” when we could least afford it to fall down. This time it took 30 minutes for me and one very smart guy on my team to decide to fire the ISP, move to a new host (thanks to Mark Cahill at Vario for the suggestions), get the coders to start debugging and have the DNS repointed. No more diagnostics. No more waiting for a service ticket to get opened, to get escalated, for serves to get rebooted, for this fix and that fix.

Screw it. Punt to a new box, learn the lesson never to host at the old ISP again, and never get a hosting relationship where your techs and sysadmins don’t have 100% God status.


%d bloggers like this: