Re: [PATCH, RFC] virtio_blk: add cache flush command

Avi Kivity <avi@xxxxxxxxxx> · Mon, 11 May 2009 21:00:24 +0300

Anthony Liguori wrote:
Avi Kivity wrote:
Christoph Hellwig wrote:
On Mon, May 11, 2009 at 06:45:50PM +0300, Avi Kivity wrote:

Right now it's fsync.  By the time I'll submit the backend change it
will still be fsync, but at least called from the posix-aio-compat
thread pool.

I think if we have cache=writeback we should ignore this.

It's only needed for cache=writeback, because without that there is no
reason to flush a write cache.

Maybe we should add a fourth cache= mode then.  But 
cache=writeback+fsync doesn't correspond to any real world drive; in 
the real world you're limited to power failures and a few megabytes 
of cache (typically less), cache=writeback+fsync can lose hundreds of 
megabytes due to power loss or software failure.

Oh, and cache=writeback+fsync doesn't work on qcow2, unless we add 
fsync after metadata updates.

But how do we define the data integrity guarantees to the user of 
cache=writeback+fsync?  It seems to require a rather detailed 
knowledge of Linux's use of T_FLUSH operations.

True.  I don't think cache=writeback+fsync is useful.  Like I mentioned, 
it doesn't act like a real drive, and it doesn't work well with qcow2.

Right now, it's fairly easy to understand.  cache=none and 
cache=writethrough guarantee that all write operations that the guest 
thinks have completed are completed.  cache=writeback provides no such 
guarantee.

cache=none is partially broken as well, since O_DIRECT writes might hit 
an un-battery-packed write cache.  I think cache=writeback will send the 
necessary flushes, if the disk and the underlying filesystem support them.

cache=writeback+fsync would guarantee that only operations that 
include a T_FLUSH are present on disk which currently includes fsyncs 
but does not include O_DIRECT writes.  I guess whether O_SYNC does a 
T_FLUSH also has to be determined.

It seems too complicated to me.  If we could provide a mode where 
cache=writeback provided as strong a guarantee as cache=writethrough, 
then that would be quite interesting.

It don't think we realistically can.

(Or maybe ext3 actually is stupid enough to flush the whole fs even for
that case

Sigh.

I'm also worried about ext3 here.

I'm just waiting for btrfs.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html