RE: raid6 - data integrity issue - data mis-compare on rebuilding RAID 6 - with 100 Mb resync speed.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I have performed the following tests to narrow down the integrity issue.

1. RAID 6, single drive failure - NO ISSUE
	a. Running IO
	b. mdadm set faulty and remove a drive
	c. mdadm add the drive back
 There is no mis-compare happen in this path.

2. RAID 6, two drive failure - write during Degrade and verify after
rebuild 
	a. remove two drives, to make the RAID array degraded.
	b. now run write IO write cycle, wait till the write cycle
completes
	c. insert the drives back one by one, and wait till the re-build
completes and a RAID array become optimal.
	d. now perform the verification cycle.
There is no mis-compare happened in this path also.

During All my test, the sync_Speed_max and min is set to 100Mb

So, as you referred in your previous mail, the corruption might be
happening only during resync and IO happens in parallel.

Also, I tested with upstream 2.6.32 kernel from git:
"http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/ -
tags/v2.6.32"
	And I am facing mis-compare issue in this kernel as well.  on
RAID 6, two drive failure with high sync_speed.

Thanks,
Manibalan.

-----Original Message-----
From: NeilBrown [mailto:neilb@xxxxxxx]
Sent: Thursday, March 13, 2014 11:49 AM
To: Manibalan P
Cc: linux-raid@xxxxxxxxxxxxxxx
Subject: Re: raid6 - data integrity issue - data mis-compare on
rebuilding RAID 6 - with 100 Mb resync speed.

On Wed, 12 Mar 2014 13:09:28 +0530 "Manibalan P"
<pmanibalan@xxxxxxxxxxxxxx>
wrote:

> >
> >Was the array fully synced before you started the test?
> 
> Yes , IO is started, only after the re-sync is completed.
>  And to add more info,
>              I am facing this mis-compare only with high resync speed 
> (30M to 100M), I ran the same test with resync speed min -10M and max
> - 30M, without any issue. So the  issue has relationship with 
> sync_speed_max / min.

So presumably it is an interaction between recovery and IO.  Maybe if we
write to a stripe that is being recoverred, or recover a stripe that is
being written to, then something gets confused.

I'll have a look to see what I can find.

Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux