On 17/07/13 02:49, Joe Lawrence wrote:
Hi Neil, Martin,
While testing patches to fix RAID1 repair GPF crash w/3.10-rc7
( http://thread.gmane.org/gmane.linux.raid/43351 ), I encountered disk
corruption when repeatedly failing, removing, and adding MD RAID1
component disks to their array. The RAID1 was created with an internal
write bitmap and the test was run against alternating disks in the
set. I bisected this behavior back to commit 7ceb17e8 "md: Allow
devices to be re-added to a read-only array", specifically these lines
of code:
This sounds like an issue I just bumped up against in RAID-5.
I have a test box with a RAID-5 comprised of 2 x 2TB drives, and 6
RAID-0's of 2 x 1TB drives.
root@test:/root# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md3 : active raid5 md20[0] md25[8] md24[7] md22[6] sdl[4] sdn[3] md23[2]
md21[1]
13673683968 blocks super 1.2 level 5, 512k chunk, algorithm 2
[8/8] [UUUUUUUU]
bitmap: 0/15 pages [0KB], 65536KB chunk
md22 : active raid0 sdk[0] sdm[1]
1953524736 blocks super 1.2 512k chunks
md20 : active raid0 sdj[0] sdo[1]
1953522688 blocks super 1.2 512k chunks
md21 : active raid0 sdh[0] sdi[1]
1953524736 blocks super 1.2 512k chunks
md25 : active raid0 sda[0] sdb[1]
2441900544 blocks super 1.2 512k chunks
md23 : active raid0 sdd[0] sde[1]
1953522688 blocks super 1.2 512k chunks
md24 : active raid0 sdf[0] sdg[1]
1953524736 blocks super 1.2 512k chunks
I was running a check over md3 whilst rsyncing a load of data onto it.
md20 was ejected some time during this process. (A smart query issued
caused a timeout on one of the drives). I removed md20, stopped md20,
started md20 and re-added md20.
This should have caused a re-build as the bitmap would have been way out
of sync, however it immediately reported the rebuild complete and left
the array mostly trashed. (about 500,000 mismatch counts).
kernel at the time was late in the 3.11-rc1 merge window.
3.10.0-09289-g9903883
I've been meaning to try and reproduce it, but as each operation takes
about 5 hours it's slow going.
This is a test array, so it has no data value. I'm happy to try to
reproduce this fault if it would help any.
Regards,
Brad
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html