• breakdowns

    San Francisco datacenter renamed "364.98 Main"

    365 Main365 Main, the troubled datacenter operator, has finished its investigation into the failure at its San Francisco facility that knocked some of the Internet's most well-known websites, from Craigslist to LiveJournal to Technorati, offline back in July. Ridiculously, the company first tried to blame PG&E for the failure, knowing full well that its clients pay it for reliable power even in a blackout. (Equally ridiculously, I ran a suspect tip that a drunk employee had wreaked havoc in the datacenter.) Now, the company has completely exonerated itself, pinning the blame on a component in its generators. Here's why you still shouldn't believe a word the company says. My analysis, and the company's press release, after the jump.


    Of course, 365 Main's generators failed. The company blames a memory chip in a piece of electronics used to start the generators automatically. But aren't these generators tested monthly? 365 Main notes that the component in question is only used in two of its datacenters. No word on whether the faulty testing procedures are also common to all of its facilities, or just present in San Francisco.

    And the kicker? 365 Main brags about the fact that it has "delivered 99.9942 percent uptime to customers," which sounds impressive until you do the math and realize that means the 365/7/24 facility is actually out of service, routinely, for nearly half an hour every year. Last month's outage, in other words, was all in a day's work for 365 Main. On top of that, consider this: It's a failure rate six times as high as the "five nines" standard 365 Main promised when it launched. 365? More like 364.98.

    Here's the press release. I recommend you trust it as much as you do the "365" in 365 Main's name.

    365 MAIN REPORTS ON ROOT CAUSE OF GENERATOR FAILURE

    Company Implements Fix for All Affected Generators and Makes Information
    about the Fix Available to Data Center Industry

    SAN FRANCISCO, Calif., Aug. 1, 2007 - Data center developer and operator 365
    Main Inc. is issuing information today that details the root cause behind
    why back-up power generators in the company's San Francisco facility failed
    to start during a PG&E power outage last week, resulting in approximately 40
    percent of customers in the facility losing power to their equipment for up
    to 45 minutes.

    The Problem

    At 1:47 p.m. on Tuesday, July 24, 365 Main's San Francisco data center was
    impacted by a power surge caused when transformer breakers at a local PG&E
    power station unexpectedly opened. PG&E has still not determined what caused
    the breakers to open.

    Typically when a power outage occurs, the outage triggers 365 Main's
    rigorously maintained and tested back-up diesel generators to start-up and
    take over providing power supply to customers. 365 Main's San Francisco
    facility has ten 2.1 megawatt back-up generators to be used in the event of
    a loss of utility power. Eight primary generators can successfully power the
    building, with two generators available on stand-by in case there are any
    failures with the primary eight.

    However, following the power outage last week, three of 365 Main's 10
    back-up power generators, manufactured by Hitec, failed to complete their
    start sequence. A complete investigation of the incident began immediately.

    Within hours of the incident, an international team of specialists was
    deployed to 365 Main's San Francisco data center facility to join on-site
    technicians and begin systematically testing the generators in search of a
    root cause. After days of thorough testing around the clock, the team
    discovered a weakness in an essential component of the back-up generator
    system known as a DDEC (Detroit Diesel Electronic Controller).

    The team discovered a setting in the DDEC that was not allowing the
    component to correctly reset its memory. Erroneous data left in the DDEC's
    memory subsequently caused misfiring or engine start failures when the
    generators were called on to start during the power outage on July 24.


    The Fix

    The investigation team discovered DDEC issues on each of the failed Hitec
    units and were able to successfully simulate failure. A fix was introduced
    by altering the timing of a command to the DDEC component, allowing more
    time between the engine shut-down command and the DDEC reset command. Once
    this fix was introduced, the Hitec generators successfully passed more than
    50 consecutive start-up sequence tests without incident.

    The testing methodology was performed by Hitec specialists along with 365
    Main's chief technician and staff. Specialists from Cupertino Electric were
    present during all testing, and EYP Mission Critical Facilities will provide
    independent verification of the findings the week of 8/6/07.

    365 Main has implemented the DDEC fix in its San Francisco and El Segundo
    facilities. Of the five data centers in 365 Main's portfolio, the San
    Francisco and El Segundo facilities are the only ones with Hitec generators
    containing DDECs. All other facilities feature other brands of generators
    or have different models of Hitecs.

    365 Main is sharing the discoveries of its investigation with other Hitec
    customers. In addition, Hitec has expanded its preventative maintenance
    procedures as a direct result of discoveries made during the 365 Main
    investigation.

    In the wake of the outage, 365 Main published an apology to customers and
    daily updates directly from the investigation team meeting minutes, allowing
    customers and the public at large to track progress. A complete archive of
    these updates and more details about today's update are available at:
    http://www.365main.com/status_update.html

    Chris Dolan, president and CEO of 365 Main, said, "365 Main has a track
    record of providing customers with data centers that are considered to be
    among the world's finest. We extend our sincere apologies to customers who
    were impacted by this incident. Addressing customer concerns is our top
    priority. In the days since the incident occurred, we have identified and
    corrected the root source of the problem and are taking steps to prevent
    this type of problem from happening again. We are also making our
    comprehensive findings available to other data centers to try to prevent the
    same problem from recurring elsewhere."

    Glenn Ellis, president and CEO of Hitec USA, also commented: "Our top
    priority is taking steps to prevent this type of unforeseen incident from
    occurring again. We sincerely apologize to 365 Main and its customers that
    our generators failed to deliver the continuous power as designed."


    365 Main's Track Record

    Since its inception over five years ago, 365 Main has delivered 99.9967
    percent power uptime to customers across its five-data-center portfolio.
    This includes the outage experienced in San Francisco last week. 365 Main's
    San Francisco facility has delivered 99.9942 percent uptime to customers
    during the last five years, inclusive of last week's outage.

    As part of their service level agreements with 365 Main, 365 Main customers
    receive rent abatements (refunds) in the event that electrical power is
    dropped in the section(s) of the data center where their servers are
    located. 365 Main is honoring all service level agreements with affected
    customers.

    Loading comments ...