Most of us in the IT field have been through a major outage of some kind. Many of us have had sleepless nights trying to get things back up and running. Recently the team at Toggl whent through their trial by fire, they did some things right and some wrong. Let’s look at their actions a bit closer and see what can be learned.
Toggl went down, it happens. There had been some sort of outage at their data center that was localized to a few racks. Then, once power was restored, they started experiencing server issues.The problems seem to be mounting. Here’s the point though. Toggl felt the pressure, however they only stopped a few times to report back to their customers.
They started out good; when the problem was obviously not theirs but their DC’s, they were posting on their support forum 2-3 times an hour. Then, once power was restored and the servers started failing, they went radio silent. Between 08/08/2011 11:30 AM and 08/09/2011 01:34 AM there was nary a peep out of them. They started off good, it wasn’t their fault and it was easy to pass the buck off to someone else. However, when things under their control started going south, they clammed up tight. This is the absolute wrong thing to do.
In Social Media circles, there is a lot of talk about authenticity; it should extend beyond just marketing though. IT teams have to be authentic as well and that means standing up and taking your licks when you’ve messed up. Here are 3 steps IT professionals can take in a downtime emergency to make sure they remain authentic.
- Assign a spokesperson.
This doesn’t have to be a senior person, those people should be working dilligently to fix the problem, but it should be someone on the inside. Call in one of your juniors, put them in the middle of the war room where they can constantly monitor the situation and have them post regular updates.
- Answer questions as if your company depends on it.
Communication is a two-way street. Whether via twitter, support forums or irc channel, when a question is asked, do your best to answer it quickly and completely. Now is not the time to do damage control. The more your customers know, they better they can respond and even help you if possible.
- Once it is over, come clean.
Once everything is back up to speed, then and only then, pull your team together and write out the story. Be honest but don’t over-share. When PHP Fog was hacked, they shared the complete transcripts of the irc chat with the hackers. This was unnecessary and did damage to their reputation. Filter through the details, share what is important and then go into damage control mode.
Most customers will appreciate your honesty, candor, and authenticity. If you are working in a technical field, being honest about what happened – “We got hacked because of a SQL Injection attack”, “Our server lost a hard drive and the RAID failed, we had to restore from backups.”, etc – will be appreciated much more than “We had unscheduled downtime.” Technical people know when you are hiding the truth and they will either find out what happened and share it with others, or just decide you aren’t worth the risk and move to a competitor.
Be authentic in emergencies, communicate often, and when it’s all over, tell the story and do what you can to help your customers recover. The customers that appreciate good service will stay, the ones that leave aren’t worth trying to keep.