Re: Suggestion needed for fixing RAID6

MRK <mrk@xxxxxxxxxxxxx> · Sat, 01 May 2010 01:54:30 +0200

On 04/30/2010 08:17 AM, Janos Haar wrote:
Hello,

OK, MRK you are right (again).
There was some line in the messages wich avoids my attention.
The entire log is here: 
http://download.netcenter.hu/bughunt/20100430/messages

Ah here we go:

Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots: Invalidating snapshot: Error reading/writing.
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1, disabling device.
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Operation continuing on 10 devices.
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: md: md3: recovery done.

Firstly I'm not totally sure of how DM passed the information of the 
device failing to MD. There is no error message about this on MD. If it 
was a read error, MD should have performed the rewrite but this 
apparently did not happen (the error message for a failed rewrite by MD 
I think is "read error NOT corrected!!"). But anyway...

The dm founds invalid my cow devices, but i don't know why at this time.

I have just had a brief look ad DM code. I understand like 1% of it 
right now, however I am thinking that in a not-perfectly-optimized way 
of doing things, if you specified 8 sectors (8x512b = 4k, which you did) 
granularity during the creation of your cow and cow2 devices, whenever 
you write to the COW device, DM might do the thing in 2 steps:

1- copy 8 (or multiple of 8) sectors from the HD to the cow device, 
enough to cover the area to which you are writing
2- overwrite such 8 sectors with the data coming from MD.

Of course this is not optimal in case you are writing exactly 8 sectors 
with MD, and these are aligned to the ones that DM uses (both things I 
think are true in your case) because DM could have skipped #1 in this case.
However supposing DM is not so smart and it indeed does not skip step 
#1, then I think I understand why it disables the device: it's because 
#1 fails with read error and DM does not know how to handle the 
situation in that case in general. If you had written a smaller amount 
with MD such as 512 bytes, if step #1 fails, what do you write in the 
other 7 sectors around it? The right semantics is not obvious so they 
disable the device.

Firstly you could try with 1 sector granularity instead of 8, during the 
creation of dm cow devices. This MIGHT work around the issue if DM is at 
least a bit smart. Right now it's not obvious to me where in the is code 
the logic for the COW copying. Maybe tomorrow I will understand this.

If this doesn't work, the best thing is probably if you can write to the 
DM mailing list asking why it behaves like this and if they can guess a 
workaround. You can keep me in cc, I'm interested.

[CUT]

echo 0 $(blockdev --getsize /dev/sde4) \
       snapshot /dev/sde4 /dev/loop3 p 8 | \
       dmsetup create cow

echo 0 $(blockdev --getsize /dev/sdh4) \
       snapshot /dev/sdh4 /dev/loop4 p 8 | \
       dmsetup create cow2

See, you are creating it with 8 sectors granularity... try with 1.

I can try again, if there is any new idea, but it would be really good 
to do some trick with bitmaps or set the recovery's start point or 
something similar, because every time i need >16 hour to get the first 
poit where the raid do something interesting....

Neil,
Can you say something useful about this?

I just looked into this and it seems this feature is already there.
See if you have these files:
/sys/block/md3/md/sync_min and sync_max
Those are the starting and ending sector.
But keep in mind you have to enter them in multiples of the chunk size 
so if your chunk is e.g. 1024k then you need to enter multiples of 2048 
(sectors).
Enter the value before starting the sync. Or stop the sync by entering 
"idle" in sync_action, then change the sync_min value, then restart the 
sync entering "check" in sync_action. It should work, I just tried it on 
my comp.

Good luck

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html