Hi Neal, >I don't know what kernel "CentOS 6.4" runs. Please report the actual kernel version as well as distro details. The Kernel version is : 2.6.32 Centos distribution : 2.6.32-358.23.2.el6.x86_64 #1 SMP : x86_64 GNU/Linux >I know nothing about "dit32" and so cannot easily interpret the output. Is it saying that just a few bytes were wrong? It is not just few bytes of corruption, it looks like some number of sectors are corrupted (for example - 40 sectors ). dit32 will write a pattern of IO, and after each write cycle, it will read it back and verify. Actually, the data which is written on the reported LBA itself corrupted. What I mean to say is, this looks like write corruption. > >Was the array fully synced before you started the test? Yes , IO is started, only after the re-sync is completed. And to add more info, I am facing this mis-compare only with high resync speed (30M to 100M), I ran the same test with resync speed min -10M and max - 30M, without any issue. So the issue has relationship with sync_speed_max / min. > >I can't think of anything else that might cause an inconsistency. I test the >RAID6 recovery code from time to time and it always works flawlessly for me. Do you suggest, any IO tool or test to ensure data integrity. One more thing, I like to bring to your notification. I did the same IO test on Ubuntu 13 (Linux ubuntu 3.8.0-19-generic #29-Ubuntu SMP Wed Apr 17 18:16:28 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux ) system also. And I faced same type of data corruption. Thanks, Manibalan. -----Original Message----- From: NeilBrown [mailto:neilb@xxxxxxx] Sent: Tuesday, March 11, 2014 8:34 AM To: Manibalan P Cc: linux-raid@xxxxxxxxxxxxxxx Subject: Re: raid6 - data intefrity issue - data mis-compare on rebuilding RAID 6 - with 100 Mb resync speed. On Fri, 7 Mar 2014 14:18:59 +0530 "Manibalan P" <pmanibalan@xxxxxxxxxxxxxx> wrote: > Hi, Hi, when posting to vger.kernel.org lists, please don't send HTML mail, just plain text. Because you did the original email didn't get to the list. > > > > We are facing a data integrity issue on RAID 6. On CentOS 6.4 kernel. I don't know what kernel "CentOS 6.4" runs. Please report the actual kernel version as well as distro details. > > > > Details of the setup: > > > > 1. 7 drives Raid6 md devices (md0) - Capacity 25 GB > > 2. Resync speed max and min set to 100000 (100Mb) > > 3. A script is running to simulate drive failure, this script will > do the following > > a. Mdadm set faulty for two random drives on the md, the mdadm > remove those drives. > > b. Mdadm add ond drive, and wait for rebuild to complete, then > insert the next one. > > c. Wait till the md become optimal, and continue the disk removal > cycle again. > > 4. iSCSI target is configured to "/dev/md0" > > 5. From Windows server, the md0 target is connected using > MicroSoft iSCSI initiator, and formatted with NTFS. > > 6. Dit32 IO tool is running on the formatted volume. > > > > Issue#: > > The Dit32 tool will running IO in multiple threads, in > each thread, IO will be written and verified. > > And on the verification Cycle, we are getting > mis-compare. Below is the log from the dit32 tool. > > > > Thu Mar 06 23:19:31 2014 INFO: DITNT application started > > Thu Mar 06 23:20:19 2014 INFO: Test started on Drive D: > > Dir Sets=8, Dirs per Set=70, Files per Dir=75 > > File Size=512KB > > Read Only=N, Debug Stamp=Y, Verify During Copy=Y > > Build I/O Size range=1 to 128 sectors > > Copy Read I/O Size range=1 to 128 sectors > > Copy Write I/O Size range=1 to 128 sectors > > Verify I/O Size range=1 to 128 sectors > > Fri Mar 07 01:28:09 2014 ERROR: Miscompare Found: File > "D:\dit\s6\d51\s6d51f37", offset=00048008 > > Expected Data: 06 33 25 01 0240 (dirSet, dirNo, fileNo, > elementNo, > sectorOffset) > > Read Data: 05 08 2d 01 0240 (dirSet, dirNo, fileNo, > elementNo, > sectorOffset) > > Read Request: offset=00043000, size=00008600 > > > > This mail has been attached with the following files for your > reference > > 1. Raid5.c and .h files, the Code what we are using. > > 2. RollingHotSpareTwoDriveFailure.sh - the script which simulates > the two disk failure. > > 3. dit32log.sav - Log file from the dit32 tool > > 4. s6d31f37 - the file where the corruption happened(hex format) > > 5. CentOS-system-info - md and system info > > I didn't find any "CentOS-system-info" attached. I know nothing about "dit32" and so can not easily interpret the output. Is it saying that just a few bytes were wrong? Was the array fully synced before you started the test? I can't think of anything else that might cause an inconsistency. I test the RAID6 recovery code from time to time and it always works flawlessly for me. NeilBrown > > > > Thanks, > > Manibalan. > > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html