Nice write-up.. I've never used hardware raid.. always just software raid1, but was never actually aware of all the inherent advantages software (md) raid has over the (dm) hardware raid that you spoke of. Very good information as always sir. ----- Original Message ---- From: David C. Rankin <drankinatty@xxxxxxxxxxxxxxxxxx> To: General Discusson about Arch Linux <arch-general@xxxxxxxxxxxxx> Sent: Sunday, June 14, 2009 4:06:39 AM Subject: Re: dmraid disk failure - howto rebuild new disk - gparted hates Me :-( David C. Rankin wrote: > David C. Rankin wrote: >> David C. Rankin wrote: >>> Listmates, >>> >>> My Seagate drives are dropping like flies with less than 1400 hours of run >>> time. (that's less than 58 days of service!) >> <snip> >>> That's pretty much where I am now. My next thought is to just use dd to copy >>> the partitions over. I have opensuse on the sda/sdc array (mapper >>> nvidia_fdaacfde), so the drives I am working with are not mounted anywhere and >>> should be easy to work with. >>> >>> What says the brain trust? Can you think of any way I was screwing up gparted >>> so it wouldn't even format the copy partitions? What about the dd method? Any >>> hints or gotchas? Any help would be appreciated. Thanks. >>> >> Ok, I decided on: >> >> dd bs=100M conv=notrunc if=/dev/sdb of=/dev/sdd >> >> I'll let you know how it comes out ;-) >> > > H E L P !!! > > Here is the situation. the dd rebuild of the new drive worked perfectly. > However, when I went to re-add the drive to the dmraid bios setup, the dmmapper > reference to the drives changed from nvidia_fffadgic to nvidia_ecaejfdi. (the > nvidia bios controller wouldn't allow a formatted drive to be 'added' so we had > to remove the existing drive and create a new dmmapper array). > > What happens during boot is, immediately after the "hooks" for "dmraid" are > called, the boot process dies looking for the old nvidia_fffadgic dmraid array. > When this occurs you are left in the God awful repair environment where the > only thing you can do is "echo *", cd and exit S O L V E D !! First off, if you are going to make something hard on yourself, go ahead and really screw it up so you can take pride and stumbling through a really long diagnoses just to find out the answer was simple all along -- you know the type, right? When I originally installed Arch on this particular box I installed Arch in a raid1 dmraid setup. However, this was the second set of dmraid arrays in the box with the first set being 2 500G drives also raid 1 and also under the dmraid convention. SuSE is spinning on the 500G set (sda/sdc) and I had installed Arch on the 750G pair spinning on (sdb/sdd) which is where the Seagate drive failed (sdd). Unlike with pure software raid "MD" Raid, dmraid doesn't have the basic functionality to allow you to simply add a new drive as a replacement and then rebuild on the fly. Some bios raid implementations allow for the rebuild at the bios configuration stage, but -- strike 3 -- the nvidia mapper on my k9n2 doesn't. Prior to the failure the device mapper for the 1st 500G array was /dev/mapper/nvidia_fdaacfde and the mapper for the 750G array with Arch was on /dev/mapper/nvidia_fffadgic. (this unfortunately would soon change to /dev/mapper/nvidia_ecaejfdi). Why?, for starters the nvidia raid controller wouldn't allow an already formatted dick to be placed in my second array under the same dev mapper title. In it's mind it would only allow a blank disk to be added and then would rely on the windows XP raid utility to to the rebuild on the fly. Current, the isn't one for linux with dmraid (However, the next release of dmraid should include the -R --Rebuild option. With gparted's refusal to copy any partition from sdb to sdd, I just resorted to the dd approach and it worked. The manual sync of the data from the good disk to the new disk was done with dd and worked fine, It just took longer than I wanted it to because for some reason gparted wouldn't let me copy the partitions from /dev/sdb to /dev/sdd which I still have yet to grasp. The true sticky wicket in this whole conundrum was dealing with the newly renamed device mapper label when the whole system was using the old. The system boot of Array1 (suse) and then passes boot control to grub on Array2 for Arch. I had updated the /boot/grub/device.map menu.lst and fstab files to accommodate the new mapper label, be every time during boot then it hit the hooks for dmraid, I would puke and complain about wanting to find the old device mapper label to be able to find /boot /root/, etc. The extent of my stupidity would soon be revealed. It seems that when I updated the mapper to reflect the new label, I only updated the device.map, menu.lst and fstab entries for the first array and just overlooked that fact that when control is passed from Array 1 to Array 2, the same device.map and menu.lst changes were needed there and that (Hello) that information doesn't get past somehow in the chain load to the second array. So after going through the mkinitcpio page at the wiki which eliminated an image issue, I just all came down to finding the guilty dogs. Grepping on fffadgic in /boot soon showed the problem files. Updating device.map and menu.lst for the second Array and life came back to normal again. So that's the missing 1/2 of the configuration I simply didn't think about at the time. Cest La Vie ... Live and learn. Hopefully, this will help some other poor sole from the same surprise if the device mapper label on his (or her ... but I haven't seen any on the list yet) second array and he is scratching his head "now I know I already updated the boot files... ;-) -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com