Re: mismatch_cnt again

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Martin K. Petersen wrote:
"Peter" == Peter Rabbitson <rabbit+list@xxxxxxxxx> writes:

Peter> Bingo - and according to the list archive many of us are getting
Peter> mismatches without swap anywhere near the raid in question. The
Peter> current situation is more akin to "Ok folks get in the plane,
Peter> we're deploying in 2 hours, and btw your chute is not going to
Peter> open and there is nothing you can do about it" How is that for a
Peter> threat model :)

Way back we used to lock pages down entirely for I/O submission.  At
some point the writeback bit was introduced to gate the page during the
actual (physical) write operation only.  That made locking trickier and
not all filesystems correctly adapted to this.  ext[234] in particular
have issues of varying degrees, somewhat amplified by their use of
buffer_heads to track buffers instead of pages.  See the recent thread
about corruption with ext4 in 2.6.32+ for examples of this.

It's not just RAID consistency that breaks.  In the ext4 case above we
end up with garbled blocks being written to a single drive.

Add data integrity protection to the mix (btrfs, DIX) and all hell
breaks loose if you change the buffer after the checksum has been
generated.  So while modifying pages in flight has kinda-sorta worked
for a while (i.e. the window of error is small) it's something we'll
simply have to stop doing to support new features in the storage stack.
You'll be glad to know there's discussion about merging the debug patch
(which marks pages read-only during writeback) into ext4.

FWIW, XFS and btrfs both use the page writeback bit correctly and never
change a page while it is undergoing I/O.

That's necessary but not sufficient. To be done correctly it must be protected by md as well. This is because arrays are used without a filesystem by some applications, such as swap and database, to name the most common cases. Data simply can't be correct on the drive if it is allowed to change between the write system call and arrival on the media, more so if a CRC or mirror is involved.

--
Bill Davidsen <davidsen@xxxxxxx>
 "We can't solve today's problems by using the same thinking we
  used in creating them." - Einstein

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux