Hi Neil, Also, I found the data corruption issue on RHEL 6.5. For your kind attention, I up-ported the md code [raid5.c + raid5.h] from FC11 kernel to CentOS 6.4, and there is no mis-compare with the up-ported code. Thanks, Manibalan. -----Original Message----- From: Manibalan P Sent: Monday, March 24, 2014 6:46 PM To: 'linux-raid@xxxxxxxxxxxxxxx' Cc: neilb@xxxxxxx Subject: RE: raid6 - data integrity issue - data mis-compare on rebuilding RAID 6 - with 100 Mb resync speed. Hi, I have performed the following tests to narrow down the integrity issue. 1. RAID 6, single drive failure - NO ISSUE a. Running IO b. mdadm set faulty and remove a drive c. mdadm add the drive back There is no mis-compare happen in this path. 2. RAID 6, two drive failure - write during Degrade and verify after rebuild a. remove two drives, to make the RAID array degraded. b. now run write IO write cycle, wait till the write cycle completes c. insert the drives back one by one, and wait till the re-build completes and a RAID array become optimal. d. now perform the verification cycle. There is no mis-compare happened in this path also. During All my test, the sync_Speed_max and min is set to 100Mb So, as you referred in your previous mail, the corruption might be happening only during resync and IO happens in parallel. Also, I tested with upstream 2.6.32 kernel from git: "http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/ - tags/v2.6.32" And I am facing mis-compare issue in this kernel as well. on RAID 6, two drive failure with high sync_speed. Thanks, Manibalan. -----Original Message----- From: NeilBrown [mailto:neilb@xxxxxxx] Sent: Thursday, March 13, 2014 11:49 AM To: Manibalan P Cc: linux-raid@xxxxxxxxxxxxxxx Subject: Re: raid6 - data integrity issue - data mis-compare on rebuilding RAID 6 - with 100 Mb resync speed. On Wed, 12 Mar 2014 13:09:28 +0530 "Manibalan P" <pmanibalan@xxxxxxxxxxxxxx> wrote: > > > >Was the array fully synced before you started the test? > > Yes , IO is started, only after the re-sync is completed. > And to add more info, > I am facing this mis-compare only with high resync speed > (30M to 100M), I ran the same test with resync speed min -10M and max > - 30M, without any issue. So the issue has relationship with > sync_speed_max / min. So presumably it is an interaction between recovery and IO. Maybe if we write to a stripe that is being recoverred, or recover a stripe that is being written to, then something gets confused. I'll have a look to see what I can find. Thanks, NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html