On Wed, Feb 17, 2010 at 08:38:11AM +1100, Steven Haigh wrote: > On Tue, 16 Feb 2010 16:25:25 -0500, Bill Davidsen <davidsen@xxxxxxx> wrote: > > The issue lies with data changing between write to multiple drives. In > > hardware raid the data traverses the memory bus once, only once, and > > goes into cache in the controller, from which it is written to all > > mirrored drives. With software raid an individual write is done to each > > drive, and if the data in the buffer changes between writes to one drive > > or the other you get different values. Neil may be convinced that the OS > > somehow "knows" which of the mirror copies is correct, ie. most recent, > > and never uses the stale data, but if that information was really > > available reads would always return the latest value and it wouldn't be > > possible to read the same file multiple times and get different MD5sums. [snip...] > I agree Bill, there is an issue with the software RAID1 when it comes down > to some hardware. I have one machine where the ONLY way to stop the root > filesystem going readonly due to journal issues is to remove RAID. Having > RAID1 enabled gives silent corruption of both data and the journal at > seemingly random times. Maybe I missed something earlier in this thread...and if so I apologize. However, I was not aware of anyone reporting FS corruption due do software RAID 1. Needless to say, a serious problem if occurring. At work, we use software RAID 1 on the majority of our production servers and have never seen problems as you describe. I'm not trying to discredit you...just that we have had not seen similar results. > I can see the data corruption from running a verify between RPM and data > on the drive. Reinstalling these packages fixes things - until something > random things get corrupted next time. For curiosity sake, what kind of files did RPM report as being corrupt after running the verify? The reason I ask as that I would expect user data to be corrupt before system files as they are typically written to disk at install/update and never written to again. Or maybe there is a reason...correct me if I'm wrong ;) In my last post, I asked Neil if he had a patch that would indicate where the mis-matches exist on disk. Have you found a way to correlate the mis-matches with your FS corruption? Bryan
Attachment:
pgp9Ug2EFP3Ki.pgp
Description: PGP signature