On 08/17/2010 08:07 AM, Christoph Hellwig wrote:
The point is that we don't want to flush the disk write cache. The
intention of writethrough is not to make the disk cache writethrough
but to treat the host's cache as writethrough.
We need to make sure data is not in the disk write cache if want to
provide data integrity.
When the guest explicitly flushes the emulated disk's write cache. Not
on every single write completion.
It has nothing to do with the qemu caching
mode - for data=writeback or none it's commited as part of the fdatasync
call, and for data=writethrough it's commited as part of the O_SYNC
write. Note that both these path end up calling the filesystems ->fsync
method which is what's require to make writes stable. That's exactly
what is missing out in sync_file_range, and that's why that API is not
useful at all for data integrity operations.
For normal writes from a guest, we don't need to follow the write with
an fsync(). We should only need to issue an fsync() given an explicit
flush from the guest.
It's also what makes
fsync slow on extN - but the fix to that is not to not provide data
integrity but rather to make fsync fast. There's various other
filesystems that can already do it, and if you insist on using those
that are slow for this operation you'll have to suffer until that
issue is fixed for them.
fsync() being slow is orthogonal to my point. I don't see why we need
to do an fsync() on *every* write. It should only be necessary when a
guest injects an actual barrier.
Regards,
Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html