Re: Why does one get mismatches?

Bill Davidsen <davidsen@xxxxxxx> · Wed, 24 Feb 2010 09:54:17 -0500

Neil Brown wrote:
md is not in a position to lock the page - there is simply no way it can stop
the filesystem from changing it.
The only thing it could do would be to make a copy, then write the copy out.
This would incur a performance cost.

Two thoughts on that - one is that for critical data, give me the option 
at array start time, make the copy, slow the performance and make it 
more consistent. My second thought is that a checksum of the page before 
initiating write and after all writes are complete might be less of a 
performance hit, and still could detect that the buffer had changed.
It seems to me, maybe I'm wrong, not a so safe design.

I think you are wrong.

This is correct.  However it would be equally correct if you were talking
about s normal disk drive rather than a RAID1 pair.
If the filesystem changes the page (or allows it to change) while a write is
pending, then it cannot know what actual data was written.  So it must write
the block out again before it ever reads it in.
RAID1 is no different to any other device in this respect.

In other words, would it be better, for the md layer,
to be robust against these kind of threats?

Possibly, but at what cost?
There are two ways that I can imagine to 'solve' this issue.

1/ always copy the page before writing.  This would incur a significant
  overhead, both in the complexity of pre-allocation memory and in the
  delay taken to perform the copy.  And it would very rarely be actually
  needed.
2/ Have the filesystem protect the page from changes while it is being
   written.  This is quite possible for the filesystem to do (while it
   is impossible for md to do).  There could be some performance
   cost with memory-mapped pages as they would need to be unmapped,
   but there would be no significant cost for reads, writes, and filesystem
   metadata operations.

Your next section somewhat mirrors my thought on md checking the data 
after write to be sure it didn't change.

   Further, any filesystem that wants to make use of the integrity checks
   that newer drives provide (where the filesystem provides a 'checksum' for
   the block which gets passed all the way down and written to storage, and
   returned on a read) will need to do this anyway.  So it is likely the in
   the near future all significant filesystems will provide all the
   guarantees md needs or order to simply do nothing different.

So my feeling is that md is doing the best thing already.

I believe 'swap' will always be an issue as unmapping swap pages during write
could be a serious performance cost.  It might be that the best thing to do
with swap is to somehow mark the area of an array used for swap as "don't
care" so md never bothers to resync it, and never reports inconsistencies
there, as they really are not an issue.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Bill Davidsen <davidsen@xxxxxxx>
 "We can't solve today's problems by using the same thinking we
  used in creating them." - Einstein

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html