Re: Why does one get mismatches?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 17, 2010 at 08:38:11AM +1100, Steven Haigh wrote:
> On Tue, 16 Feb 2010 16:25:25 -0500, Bill Davidsen <davidsen@xxxxxxx> wrote:
> > The issue lies with data changing between write to multiple drives. In 
> > hardware raid the data traverses the memory bus once, only once, and 
> > goes into cache in the controller, from which it is written to all 
> > mirrored drives. With software raid an individual write is done to each 
> > drive, and if the data in the buffer changes between writes to one drive
> > or the other you get different values. Neil may be convinced that the OS
> > somehow "knows" which of the mirror copies is correct, ie. most recent, 
> > and never uses the stale data, but if that information was really 
> > available reads would always return the latest value and it wouldn't be 
> > possible to read the same file multiple times and get different MD5sums.

[snip...]

> I agree Bill, there is an issue with the software RAID1 when it comes down
> to some hardware. I have one machine where the ONLY way to stop the root
> filesystem going readonly due to journal issues is to remove RAID. Having
> RAID1 enabled gives silent corruption of both data and the journal at
> seemingly random times.

Maybe I missed something earlier in this thread...and if so I apologize.
However, I was not aware of anyone reporting FS corruption due do software
RAID 1.  Needless to say, a serious problem if occurring.

At work, we use software RAID 1 on the majority of our production servers
and have never seen problems as you describe.  I'm not trying to
discredit you...just that we have had not seen similar results. 

> I can see the data corruption from running a verify between RPM and data
> on the drive. Reinstalling these packages fixes things - until something
> random things get corrupted next time.

For curiosity sake, what kind of files did RPM report as being corrupt
after running the verify?  The reason I ask as that I would expect user
data to be corrupt before system files as they are typically written to
disk at install/update  and never written to again.  Or maybe there is a
reason...correct me if I'm wrong ;)

In my last post, I asked Neil if he had a patch that would indicate where
the mis-matches exist on disk.  Have you found a way to correlate the
mis-matches with your FS corruption?  

Bryan

Attachment: pgp9Ug2EFP3Ki.pgp
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux