Archive for March, 2009
The Mixed Blessing of System Failure
ok, so here’s the scoop. I was planning on switching unicron over to CentOS from Ubuntu, but during the prep process, I must have bumped a sata cable on my raid array- the result was me rebooting, rebuilding the drive and shaking it off as a fluke.
A day later, that drive and another failed, which is bad news on a 4 disk raid5 array- usually it means everything is lost.
By some minor miracle, I was able to restore the array, but in the process installed cent into one of the two 1-gig swap partitions (formatted to ext3). From there I was able to make sure the data was backed up.
I had already created the 20 gig partition I planned to install CentOS to (it was unavailable during the raid failure, hence I didn’t use it), so I figured, “what the hell, the server is down I might as well rebuild it now.”
It’s a pain in the ass, but in the end it forced my hand to rebuild the server. The hardest part is remembering all the functionality I’ve set up over the last 2 years. Here’s what I’ve done so far since Friday:
- reinstall
- fix funky ext3 partition (/share drive)
- install screen, irssi, sudo which, lsof, lsusb
- install bind, dhcp
- configure bind, dhcp
- activate bind, dhcp
- shutdown linksys dhcp
- modify iptables to let traffic through
- install apache, mod_ssl
- reconfigured apache, based on work config
- imported mysql DBs (including wiki)
- reinstalled wiki
- setup ssl certs
- set up unrealircd
- ssh keys working
- configure sshd
- set up ldap
- implement ldap into system accounts
- get ldapadmin working
- setup snmpd
I’ve done quite a bit more, but this is just what I remembered to document so far. What’s left?
- get subversion working
- set up myphpadmin
- set up /share share
- set up backups for LE websites
- set up nessus
- set up nagios
- set up nagiosgraph
- recreate ldap accts
- set up netflow/ntop
I’m sure this list will grow over the next few days.
ripping one
With the freakout of my raid array and the rebuilding of my server, it’s about that time of year where I re-rip all of my audio CDs. Since I don’t think mp3c was ever updated since the last rip (and subsequent mangling of a few titles), I’ll be investigating some of these new rippers:
burn – Command line Data-CD, Audio-CD, ISO-CD, Copy-CD writing tool
crip – terminal-based ripper/encoder/tagger tool
cwcdr – Chez Wam CD Ripper
digitaldj – An SQL based mp3 player front-end
dir2ogg – audio file converter into ogg-vorbis format
distmp3 – A Perl client and daemon for distributed audio encoding
grabcd-encode – rip and encode audio CDs – encoder
grip – GNOME-based CD-player/ripper/encoder
i810switch – Enables/disables video output to CRT/LCD on i810 video hardware
jack – Rip and encode CDs with one command
mp3burn – burn audio CDs directly from MP3, Ogg Vorbis, or FLAC files
mp3c – MP3Creator – Creator for MP3/OGG-files
mp3cd – Burns normalized audio CDs from lists of MP3s/WAVs/Oggs/FLACs
mybashburn – Burn data and create songs with interactive dialog box
ripit – Textbased audio cd ripper
ripperx – a GTK-based audio CD ripper/encoder
kaudiocreator – CD ripper and audio encoder frontend for KDE
mozilla-plugin-vlc – multimedia plugin for web browsers based on VLC
more details to come as I dig into it.
frak.
well, I guess sdd got jealous of sdc and decided to blow out as well- either that or my sata controller died.
other way, BAD.
home sick
So, I’m home sick again. Fourth time this year I’ve been sick. Cold, flu, cold, Bronchitis. Awesome. Will I get to rest today? No, of course not.
My server (Unicron) has been up and running for 2 years now- I got the parts right after Ian was born. I set up a nice software raid array at the time that’s served me well. I’d never set up a raid array like this before, so I wasn’t really sure how to monitor it. The raid array has been running fine for 2 years, so I just sorta let it slide.
Over this past weekend, I did some work resizing the lvm partitions and had to poke around with the raid stuff. I found not one, but two ways to monitor it one was to set up a monitor with the mdadm tools and have it email me if there was a problem, and the other lead me to create a simple nagios monitor. I set both up sunday night.
Flash forward to this morning- jackie wakes me up, asks me if I’m going to work (I’d come home sick the day before and was still out of it.) My intention was to wake up long enough to IM my manager and supervisor and let them know I was gonna be sick. I do so, then minimize the im stuff. Staring me in the face was the following:
Wait, wait, wait- my script must be crappy, there’s no way the raid array choked right after setting up the monitoring. I sorta go into denial and check my email:
This is an automatically generated mail message from mdadm
running on unicronA Fail event had been detected on md device /dev/md2.
It could be related to component device /dev/sdc2.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md2 : active raid5 sda3[0] sdd2[3] sdc2[4](F) sdb3[1]
1461633600 blocks level 5, 64k chunk, algorithm 2 [4/3] [UU_U]md3 : active raid1 sdc1[0] sdd1[1]
979840 blocks [2/2] [UU]md1 : active raid1 sda2[0] sdb2[1]
979840 blocks [2/2] [UU]md0 : active raid1 sda1[0] sdb1[1]
192640 blocks [2/2] [UU]unused devices:
Frak.
So here I am, praying it was a hiccup while I reboot and rebuild my raid array. it looks like sdc2 went nutty about 45 minutes after I went to bed. I restarted the server and sdc2 reappeared, and I’m rebuilding now to see what happens,
md2 : active raid5 sdc2[4] sda3[0] sdd2[3] sdb3[1]
1461633600 blocks level 5, 64k chunk, algorithm 2 [4/3] [UU_U]
[=================>…] recovery = 85.0% (414436248/487211200) finish=26.8min speed=45182K/sec
One thing is for sure, I need to get a auxiliary drive in case this one goes kaput for real. I said I’d buy one in june… 2007. suppose I better get one that, huh?
My apologies of this was nonsensical, I’m really tired.
new plugin test.
So I’m testing a nifty new wordpress plugin… check this out:
I wonder if it’ll work?
update: no, no it will not.