Anthony Liguori wrote:
Avi Kivity wrote:
Anthony Liguori wrote:
Right now, it's fairly easy to understand. cache=none and
cache=writethrough guarantee that all write operations that the
guest thinks have completed are completed. cache=writeback provides
no such guarantee.
cache=none is partially broken as well, since O_DIRECT writes might
hit an un-battery-packed write cache. I think cache=writeback will
send the necessary flushes, if the disk and the underlying filesystem
support them.
Sure, but this likely doesn't upset people that much since O_DIRECT
has always had this behavior.
But people are not using O_DIRECT. They're using their guests, which
may or may not issue the appropriate barriers. They don't know that
we're using O_DIRECT underneath with different guarantees.
Using non-battery backed disks with writeback enabled introduces a
larger set of possible data integrity issues. I think this case is
acceptable to ignore because it's a straight forward policy.
It isn't straightforward to me. A guest should be able to get the same
guarantees running on a hypervisor backed by such a disk as it would get
if it was running on bare metal with the same disk. Right now, that's
not the case, we're reducing the guarantees the guest gets.
cache=writeback+fsync would guarantee that only operations that
include a T_FLUSH are present on disk which currently includes
fsyncs but does not include O_DIRECT writes. I guess whether O_SYNC
does a T_FLUSH also has to be determined.
It seems too complicated to me. If we could provide a mode where
cache=writeback provided as strong a guarantee as
cache=writethrough, then that would be quite interesting.
It don't think we realistically can.
Maybe two fds? One open in O_SYNC and one not. Is such a thing sane?
For all I care, yes. Filesystem developers would probably have you
locked up.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html