RAID Angst

guh blah.

So over the weekend, I got another 250 gb drive and actually got Fluffy up to a GB. This involved building the RAID in degraded mode, transfering files to it, then adding a drive into the raid. That one drive had data that I didn’t have space to back up, namely 250gb of Anime.

So I go to add the final drive to the array and do a raidhotadd of the device. All is well exept it’s not adding it in position 4 as it should but to position 5. Very strange. So I just let it go over night. In the morning I check /proc/mdstat, and it shows that it has added /dev/hdk1 to the raid, but that it’s only using 4 of the 5 drives. Sigh. So I do a raidsetfaulty /dev/md0 /dev/hdk1 and set the drive as failed. All is well, it shows as failed. I then try to raidhotremove the drive so that I can re-add it. No dice. Says it’s busy. So I stop the raid using raidstop /dev/md0. Raid stops. Okay. raidhotremove…. nope, raid’s not up, you can’t remove a drive. Okay, so I do a raidstart /dev/md0. Barf. Raid won’t start.

This is all in half an hour before I need to go to work. So tonight I get to figure out how to get the raid running in degraded mode again so that I can at least pull that anime off before nuking the RAID and re-setting it up.

Update, after a bit of reading, it looks like there was a screwup when I created the RAID in degraded mode. In the /etc/raidtab file I had declared

device /dev/hdc1 raid-disk 0 device /dev/hde1 raid-disk 1 device /dev/hdg1 raid-disk 2 device /dev/hdi1 raid-disk 3 device /dev/hdk1 raid-disk 4 failed-disk

The last entry should have been:

device /dev/hdk1 failed-disk 4

So when I added it to the raid, it saw 2 properties, and apparently took the second of the 2, failed disk, but with no disk number. So when I added hdk, it had no number and was probably treated as a spare. Because I didn’t properly define the raid when I made it, I think I’m going to have to correct the raidtab, and then mkraid -f to create a new superblock on the raid. From what I am reading, this should be okay and not nuke the data. I guess we’re going to find out. Pray for me.

For future reference, when creating a software raid in degraded mode, one must substitute failed-disk # for raid-disk #, not make failed-disk an addition property of the device. This will save you a headache when trying to raidhotadd the device.