Re: Why does one get mismatches?

"Martin K. Petersen" <martin.petersen@xxxxxxxxxx> · Wed, 24 Feb 2010 11:12:09 -0500

>>>>> "Bill" == Bill Davidsen <davidsen@xxxxxxx> writes:

>> Absolute rubbish does seem to be a suitable phrase here.  There is no
>> question of data corruption.  When memory changes between being
>> written to one device and to another, this does not cause corruption,
>> only inconsistency.  Either the block will be written again
>> consistently soon, or it will never be read.

Bill> Just what is it that rewrites the data block? The user program
Bill> doesn't know it's needed, the filesystem, if any, doesn't know
Bill> it's needed, and as far as I can tell md doesn't do checksum
Bill> before issuing the write and after the last write is done. Doesn't
Bill> make a copy and write from that. So what sees that the data has
Bill> changed and rewrites it?

The filesystem updates the page, causing it to be marked dirty again.
The VM will then eventually schedule the page to be written out.  The
"when" depends on filesystem type and whether there's metadata or data
in the page.

In this discussion there seems to be a focus on the case where one
mirror is correct and one is not.  However, that's usually not how it
works out.  A more realistic scenario is that both mirror copies are
incorrect because the page was continuously updated.  I.e. both mirrors
have various degrees of new and stale data inside a 4KB block.

So realistically both disk blocks are wrong and there's a window until
the new, correct block is written.  That window will only cause problems
if there is a crash and we'll need to recover.  My main concern here is
how big the discrepancy between the disks can get, and whether we'll end
up corrupting the filesystem during recovery because we could
potentially be matching metadata from one disk with journal entries from
another.

-- 
Martin K. Petersen	Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html