SAN FRANCISCO, 4:57 AM, MON MAY 12 | 1 POSTS IN THE LAST 24 HOURS | tips@valleywag.com | SUBMIT A TIP | RSS

365 Main outage causes aftershocks in Web world


We've now learned more about the outage at 365 Main's San Francisco datacenter that knocked some of the Web's most popular sites offline. The latest theory: An employee, reportedly drunk, hit the emergency-power-off switch in 365 Main's Colo 4 room. (Update: I no longer know whether to trust the source who sent in the tip about a drunk employee.) Other sites located in other rooms were unaffected. This isn't the first time 365 Main has suffered an EPO-induced outage; a major one still remembered by customers occurred back in April 2005, and another took place last year. After the jump, a gallery of the carnage caused, and a roundup of reactions.

Some of the affected websites — most of which are back online — played it straight with customers, like Craigslist. Others offered the usual pack of lies websites trot out. AdBrite, for example, tried to claim that the outage was due to "scheduled maintenance," and RedEnvelope, the e-commerce gifts site which just today crowed about moving all of its Web operations to 365 Main, said the outage was a systems upgrade. Busted!

4:38 PM on Tue Jul 24 2007
By Owen Thomas
17,885 views
27 comments

Comments

  • It's painfully obvious that valleywag needs a tech insider perspective. The sketchy down-time messages were probably not lies - just the lazy default messages. Busted!

  • I've been in 365 main a few times and I can say that it's not a pretty DC by any means. Their security is also a little lax so I'm not surprised that a drunk employee had access to the switch room.

  • you sure it wasn't the transformer explosion on mission that knocked out power to around 30,000?

  • No wonder I cant find Women offering to host a Ski party in the middle of summer lol.

  • [sfgate.com]
    here's the real issue, read towards the bottom of the article

  • Looks like Adbrite reads valleywag...

    [www.adbrite.com]

  • yeah, doesn't surprise me.

    I recently had a douchebag here at work throw a breaker at random - which cut power to our servers.

    incredible, INCREDIBLE irresponsibility.

  • dude, you forgot hi5.com - its back up now...

  • I've been to 365 Main, 360 Spear, 200 Paul and all sorts of other datacenters. The story about a drunken employee is ac over story... what's the chance of that being the reason for the power outage at the exact same time San Francisco was having major power outages.

    The real story is that 365 Main is run by incompetent managers who can't put together something fault tolerant if their customers were outside their doors with pitchforks... oh wait.. right!

  • i commented on this earlier and it wasn't published.How about a nod like, hey thanks for the tip pal. Weenies.

  • wait i take that back, please

  • Image of Owen Thomas Owen Thomas at 06:18 PM on 07/24/07 *

    @leahculver: Busted right back! If tech insiders believe that it's okay to give your customers false information because of a "lazy default message," then give me the tech-outsider perspective any day. Technology makes it okay to lie, folks -- you heard it here first.

  • @Owen Thomas: If you really think that any of these people with long ass nights ahead of them have "make sure the status page is 100% accurate" anywhere near the top of their to-do list, then I'm glad nobody's relying on you to bring anything back up.

  • i'm with Leah here.. Besides if they said they were having server problems/outage they'd get a heap of unsolicited emails offering lame advice, selling solutions, or wagging their finger at them for having such flimsy and precarious servers. Alternative message: "For reasons to complicated to explain here, the site will be experiencing some downtime until the problem, that is too complicated to explain here, is fixed".

  • Our offices are at 3rd and Townsend. The data center lost power the same time our office did. The implication is that the outage was caused by a loss of utility power.

    I am not putting much faith in the drunken employee for multiple reasons:

    - the timing is too suspicious

    - if someone did press the Big Red Button, it is unlikely anyone at 365M could restore power. They would have to bring in their contractors. And as we know, the power went on and off four or five times before settling down. This behavior is more consistent with an utility power outage than a BRB push

    - Multiple colo rooms, but not all, were affected. If it was an intoxicated employee this person would probably have to walk from room to room to push the BRBs.

  • No commenter image uploaded sample032 at 08:20 PM on 07/24/07 *

    @DOOFUSGUMBY: the real question was why didn't the generators kick in. Most good hosts like that have backup batteries and generators so there's never a loss of power.

  • If when your site goes down because it can't connect to database servers, or the main servers, and the default error page says 'scheduled maintenance' when it obviously isn't, then that is lying. The error page should say that it couldn't connect, or give the correct HTTP error code

  • Can't they just, like, cut it back on?

  • That's bullshit- I'm in Colo 3 and power was lost in there too. This wasn't an EPO event- power in the whole building bounced multiple times.

  • hmm, when did Lindsay Lohan start working in IT?

  • @BH

    Pray tell, what is the correct HTTP error code for "dumbshit turned the power off"? I have the RFC in front of me but I can't seem to figure it out.

    Also, what should the server's penalty be for lying about scheduled maintenance, and is there an HTTP error code for "Server is in the penalty box"?

  • Leah was right -- 'twas a leftover maintenance page from our last outage (which was scheduled).

    That said, I agree with Owen that it was inaccurate. Next time we'll be more careful with the wording on our maintenance page.

    But Pacman is staying. ;)

    Rock on,
    Pud
    AdBrite

  • Image of Owen Thomas Owen Thomas at 11:48 PM on 07/24/07 *

    @pud: Well said, and well done. I'm glad that, unlike Culver, you're willing to take responsibility for what your website tells users.

  • I worked there a few weeks ago. I dont know what
    DAVETYRANHAM means by lax security. They made my whole crew check their ID's at the door. The biotech companies that play with AIDS and HEP C dont make us do that. But about the cage thing. Nothing would stop a determined person from climbing under the subfloor. Or in some places you could probably just reach thru and grab stuff.
    hmm maybe we will get a call to come out and relo some stuff.

  • @Owen Thomas: AAAAAAAAAAAAAhahahahahahahah.

    You telling some other website to be more careful what they tell their users is....


    bwaaaaaahhahahahahahahaha. Can't stop laughing.

    Just admit it, you really don't know what you're talking about.

  • @Owen - I'm not saying it's okay to mislead users. I'm just saying that I don't think those sites were intentionally lying this time.

  • Image of Owen Thomas Owen Thomas at 06:42 PM on 07/25/07 *

    @leahculver: Noted, and thanks for the clarification on your stance. It helps to think like a user and not a sysadmin from time to time.

Comment on this post

Reply by Email

Login with your username and password below. Or comment on this post via email.