a not so fun day
Feb. 4th, 2007 09:12 pmMy day was co-opted by work. I'm on-call, and we had a disk fail in one of the HP JBOD arrays. The system in question has the drives mirrored via MirrorDisk, but doesn't have OnlineJFS due to database on the system using raw logical volumes. So I got to spend several hours running commands at the commandline to be able to get to the point where I could actually swap-out the failed drive. But it all worked as advertised and the system is back to normal.
Tomorrow I get to share with my coworkers what I did, why I did it, and how I went about it so when this happens again, I won't be on the hook for it. And it's going to happen again. We've had the system in question since the summer of '02. I suspect we are rapidly approaching the Average Meantime to Failure on these disks. I have to remember to dig-up that number from HP tomorrow. If I can get any of the stats and/or methods HP used to derive that number, then I'll be able to tell if our system is within a Standard Deviation of the number they give, or not. If we are, and particularly if the range for the SD is in the hundreds of hours, that'll give more fuel to the arguement to move the data to the SAN sooner rather than later.
Except I screwed-up. We had downtime on this system in November and again in January, and due to a complication of unprepairness and lingering problems to do a misconfiguration (since diagnosed and corrected), I still need one more reboot to get the system attached to the SAN from the software side of life. Which means more downtime. Which means more political BS. At least I don't have to deal with it directly, as I have a more than competent manager, thank God.
So, more fun on tap for tomorrow. At least I didn't break my foot again.
Tomorrow I get to share with my coworkers what I did, why I did it, and how I went about it so when this happens again, I won't be on the hook for it. And it's going to happen again. We've had the system in question since the summer of '02. I suspect we are rapidly approaching the Average Meantime to Failure on these disks. I have to remember to dig-up that number from HP tomorrow. If I can get any of the stats and/or methods HP used to derive that number, then I'll be able to tell if our system is within a Standard Deviation of the number they give, or not. If we are, and particularly if the range for the SD is in the hundreds of hours, that'll give more fuel to the arguement to move the data to the SAN sooner rather than later.
Except I screwed-up. We had downtime on this system in November and again in January, and due to a complication of unprepairness and lingering problems to do a misconfiguration (since diagnosed and corrected), I still need one more reboot to get the system attached to the SAN from the software side of life. Which means more downtime. Which means more political BS. At least I don't have to deal with it directly, as I have a more than competent manager, thank God.
So, more fun on tap for tomorrow. At least I didn't break my foot again.