So today we took Brent out for a going away lunch. Right as we're getting ready to go, the power flickers, and a few seconds later the lights go out. A short time later there's a bit of a flicker as the generator tries to kick in, but it struggles for a few seconds delivering way less than necessary, and then goes out. We figure what luck, but our server rooms should have kicked over to a different power feed, which has an independent generator. About this time the VP walks into the hallway to say we should make sure they're up and running OK. So we split up, figuring a few minutes later we'll be on the way to lunch, and hopefully the campus will be back up when we return.
Unfortunately we arrive to find the rooms on battery power, apparently meaning the second (independent) generator system also failed to start up. A few minutes later, we know that it's an area wide thing, both generators almost immediately shut down due to overload, and the fire alarm alert thing is getting old quickly. Why can that thing work great when there's no power, but has issues holding door magnets sometimes when stuff is up? Anyways... So the lunch plans are on hold. We have a crowd of ITS people around the server room just sorta hanging out and wondering when we can go to lunch or alternatively when we need to start shutting stuff down. Some of us are shutting down unnecessary stuff to keep the heat in the room down (AC is on power transfer so can get all 3 indefinite power sources, but not battery). Bethelwulf has long since shut itself down (aka its UPSes ran out of power).
A few minutes later one of the server rooms is below the safe runtimes on the UPSes (below 1 hour is unacceptable, it takes 15-20 minutes if we work very quickly and overlap when possible to get things shut down properly), so it's time to start shutting stuff down. Of course right about the time we're actually planning shutting down production machines that have redundancies (still don't want service interruptions at this point), we hear that the power is going to be back within a few minutes. So that goes to "let's wait 5 minutes" status.
Sure enough, a couple minutes later the AC kicks online and the UPSes off battery - one feed is up and running. A few minutes later the room lights come on, as the normal building feed comes up. Quick status checks, chats with the VP and EVP who are both wandering around wondering how much of our stuff is affected, and we're off to lunch. Of course most people were off to class or work, so were much less excited about the outage being over than we were. After the last big storms came through I almost forgot how annoying actually losing power can be.
Lunch was good, large crowd, although Arby's was packed. Of course leaving at a bit after noon rather than 11:30 will do that. After getting back we discovered that the only lost equipment (at least infrastructure, and so far) was apparently one switch card, which is pretty good. We decide that another battery chassis for the one server room needs to be investigated (especially as a run to power another wiring closet is being run from it soon), and decide getting the room lights on the transfer equipment, even though it wouldn't have helped in this case, could be very handy. In the afternoon I get a call from the electrician wanting to talk about the issues and testing to make sure it doesn't happen again, checking how much delays in other stuff (like getting that feed to the closet in place) may have impacted us, and other stuff. We meet tomorrow after the cause of the second generator failure is determined (the one is unfortunately normal due to the surge load on startup, they have to bring the campus up building-by-building for it to handle it. This is why we have multiple feeds for our stuff now). Should be good to get that figured out.
So that was my day. Anything exciting happen in yours?
Copyright ©2000-2008 Jeremy Mooney (jeremy-at-qux-dot-net)
I read about colored bubbles named Zubbles. I think you got me beat. ;)