Theodore Tso wrote:
- If I remember the details correctly, Chris Mason has demonstrated a
50% chance of corruption directory entries in ext3 for example.
Chris Mason has a script which forces the system to be under a lot of
memory pressure, and in that scenario, it is highly likely that
without barriers, there will be filesystem corruptions if the system
is abruptly turned off while his script is running.
Andrew Monrton has been resistant in making barriers=1 be the default
for ext3 because (as I understand it) he disbelieves that this is an
adequate real-world example, and there is a real performance hit to
running without barriers.
If you have a battery backed write cache (say, in a high end array)
barriers can be ignored since the storage can effectively make that
write cache non-volatile, but otherwise, this is pretty key for
anyone wanting to maintain data integrity,
That's what I getting at, array controllers with a battery backed
write cache (BBWC). We disable the write cache on the physical
disks and provide no mechanism to re-enable the cache except in
some SATA configurations.
Well, we still need the barrier on the block I/O elevantor side to
make sure that requests don't get reordered in the block layer. But
what you're saying is that once the write is posted to the array, it
is guaranteed that it is on "stable storage" (even if it is BBWC) such
that if someone hits the Big Red Switch at the exit to the data
center, and power is forcibly cut from the entire data center in case
of a fire, the battery will still keep the cache alive, at least until
the sprinklers go off, anyway, right? :-)
Yes, true....
In that case, I suspect the right thing for the cciss array to do is
to ignore the barrier, but not to return an error. If you return an
error, and refuse the write with barrier operation (which is what the
cciss driver seems to be doing starting in 2.6.29-rcX), ext4 will
retry the write without the barrier, at which point we are vulnerable
to the block layer reordering things at the I/O scheduler layer. In
effect, you're claiming that every single write to cciss is implicitly
a "barrier write" in that once it is received by the device, it is
guaranteed not to be lost even if the power to the entire system is
forcibly removed.
- Ted
Aren't barriers tied still to the state of the write cache on the target
drive? In other words, if the write cache is off, we disable barriers
automatically. I think that this happens for scsi in sd_revalidate_disk().
In this case, it sounds like we have tangled the need to flush a drive's
write with the need to not re-order IO in the elevator code.
Ric
_______________________________________________
Ext3-users mailing list
Ext3-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/ext3-users