Re: Suggestion needed for fixing RAID6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/23/2010 10:47 AM, Janos Haar wrote:

----- Original Message ----- From: "Luca Berra" <bluca@xxxxxxxxxx>
To: <linux-raid@xxxxxxxxxxxxxxx>
Sent: Friday, April 23, 2010 8:51 AM
Subject: Re: Suggestion needed for fixing RAID6


another option could be using the device mapper snapshot-merge target
(writable snapshot), which iirc is a 2.6.33+ feature
look at
http://smorgasbord.gavagai.nl/2010/03/online-merging-of-cow-volumes-with-dm-snapshot/
for hints.
btw i have no clue how the scsi error will travel thru the dm layer.
L.

...or cowloop! :-)
This is a good idea! :-)
Thank you.

I have another one:
re-create the array (--assume-clean) with external bitmap, than drop the missing drive. Than manually manipulate the bitmap file to re-sync only the last 10% wich is good enough for me...


Cowloop is kinda deprecated in favour of DM, says wikipedia, and messing with the bitmap looks complicated to me. I think Luca's is a great suggestion. You can use 3 files with loop-device so to store the COW devices for the 3 disks which are faulty. So that writes go there and you can complete the resync. Then you would fail the cow devices one by one from mdadm and replicate to spares.

But this will work ONLY if read errors are still be reported across the DM-snapshot thingo. Otherwise (if it e.g. returns a block of zeroes without error) you are eventually going to get data corruption when replacing drives.

You can check if read errors are reported, by looking at the dmesg during the resync. If you see many "read error corrected..." it works, while if it's silent it means it hasn't received read errors which means that it doesn't work. If it doesn't work DO NOT go ahead replacing drives, or you will get data corruption.

So you need an initial test which just performs a resync but *without* replicating to a spare. So I suggest you first remove all the spares from the array, then create the COW snapshots, then assemble the array, perform a resync, look at the dmesg. If it works: add the spares back, fail one drive, etc.

If this technique works this would be useful for everybody, so pls keep us informed!!
Thank you
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux