Re: dmraid disk failure - howto rebuild new disk - gparted hates Me :-(

Jonathan Brown <jbssfl@xxxxxxxxx> · Sun, 14 Jun 2009 05:32:18 -0700 (PDT)

Nice write-up.. I've never used hardware raid.. always just software raid1, but was never actually aware of all the inherent advantages software (md) raid has over the (dm) hardware raid that you spoke of.  Very good information as always sir.

----- Original Message ----
From: David C. Rankin <drankinatty@xxxxxxxxxxxxxxxxxx>
To: General Discusson about Arch Linux <arch-general@xxxxxxxxxxxxx>
Sent: Sunday, June 14, 2009 4:06:39 AM
Subject: Re: dmraid disk failure - howto rebuild new disk - gparted hates Me :-(

David C. Rankin wrote:
> David C. Rankin wrote:
>> David C. Rankin wrote:
>>> Listmates,
>>>
>>>     My Seagate drives are dropping like flies with less than 1400 hours of run
>>> time. (that's less than 58 days of service!) 
>> <snip>
>>>     That's pretty much where I am now. My next thought is to just use dd to copy
>>> the partitions over. I have opensuse on the sda/sdc array (mapper
>>> nvidia_fdaacfde), so the drives I am working with are not mounted anywhere and
>>> should be easy to work with.
>>>
>>>     What says the brain trust? Can you think of any way I was screwing up gparted
>>> so it wouldn't even format the copy partitions? What about the dd method? Any
>>> hints or gotchas? Any help would be appreciated. Thanks.
>>>
>> Ok, I decided on:
>>
>>     dd bs=100M conv=notrunc if=/dev/sdb of=/dev/sdd
>>
>> I'll let you know how it comes out ;-)
>>
> 
> H E L P !!!
> 
>     Here is the situation. the dd rebuild of the new drive worked perfectly.
> However, when I went to re-add the drive to the dmraid bios setup, the dmmapper
> reference to the drives changed from nvidia_fffadgic to nvidia_ecaejfdi. (the
> nvidia bios controller wouldn't allow a formatted drive to be 'added' so we had
> to remove the existing drive and create a new dmmapper array).
> 
>     What happens during boot is, immediately after the "hooks" for "dmraid" are
> called, the boot process dies looking for the old nvidia_fffadgic dmraid array.
> When this occurs you are left in the God awful repair environment where the
> only thing you can do is "echo *", cd and exit 

S O L V E D !!

    First off, if you are going to make something hard on yourself, go ahead and
really screw it up so you can take pride and stumbling through a really long
diagnoses just to find out the answer was simple all along -- you know the
type, right?

    When I originally installed Arch on this particular box I installed Arch in a
raid1 dmraid setup. However, this was the second set of dmraid arrays  in the
box with the first set being 2 500G drives also raid 1 and also under the
dmraid convention. SuSE is spinning on the 500G set (sda/sdc) and I had
installed Arch on the 750G pair spinning on (sdb/sdd) which is where the
Seagate drive failed (sdd). Unlike with pure software raid "MD" Raid, dmraid
doesn't have the basic functionality to allow you to simply add a new drive as
a replacement and then rebuild on the fly. Some bios raid implementations allow
for the rebuild at the bios configuration stage, but -- strike 3 -- the nvidia
mapper on my k9n2 doesn't.

    Prior to the failure the device mapper for the 1st 500G array was
/dev/mapper/nvidia_fdaacfde and the mapper for the 750G array with Arch was on
/dev/mapper/nvidia_fffadgic. (this unfortunately would soon change to
/dev/mapper/nvidia_ecaejfdi). Why?, for starters the nvidia raid controller
wouldn't allow an already formatted dick to be placed in my second array under
the same dev mapper title. In it's mind it would only allow a blank disk to be
added and then would rely on the windows XP raid utility to to the rebuild on
the fly. Current, the isn't one for linux with dmraid (However, the next
release of dmraid should include the -R --Rebuild option. With gparted's
refusal to copy any partition from sdb to sdd, I just resorted to the dd
approach and it worked.

    The manual sync of the data from the good disk to the new disk was done with
dd and worked fine, It just took longer than I wanted it to because for some
reason gparted wouldn't let me copy the partitions from /dev/sdb to /dev/sdd
which I still have yet to grasp. The true sticky     wicket in this whole
conundrum was dealing with the newly renamed device mapper label when the whole
system was using the old. The system boot of Array1 (suse) and then passes boot
control to grub on Array2 for Arch. I had updated the /boot/grub/device.map
menu.lst and fstab files to accommodate the new mapper label, be every time
during boot then it hit the hooks for dmraid, I would puke and complain about
wanting to find the old device mapper label to be able to find /boot /root/, etc.

    The extent of my stupidity would soon be revealed. It seems that when I
updated the mapper to reflect the new label, I only updated the device.map,
menu.lst and fstab entries for the first array and just overlooked that fact
that when control is passed from Array 1 to Array 2, the same device.map and
menu.lst changes were needed there and that (Hello) that information doesn't
get past somehow in the chain load to the second array.

    So after  going through the mkinitcpio page at the wiki which eliminated an
image issue, I just all came down to finding the guilty dogs. Grepping on
fffadgic in /boot soon showed the problem files. Updating device.map and
menu.lst for the second Array and life came back to normal again. So that's the
missing 1/2 of the configuration I simply didn't think about at the time. Cest
La Vie ... Live and learn. Hopefully, this will help some other poor sole from
the same surprise if the device mapper label on his (or her ... but I haven't
seen any on the list yet) second array and he is scratching his head "now I
know I already updated the boot files... ;-)

-- 
David C. Rankin, J.D.,P.E.
Rankin Law Firm, PLLC
510 Ochiltree Street
Nacogdoches, Texas 75961
Telephone: (936) 715-9333
Facsimile: (936) 715-9339
www.rankinlawfirm.com