On Thu, Aug 27, 2009 at 08:21:55PM +0930, Rusty Russell wrote: > > - virtio-blk needs to advertise ordered queue by default. > > This makes cache=writethrough safe on virtio. > > >From a guest POV, that's "we don't know, let's say we're ordered because that > may make us safer". Of course, it may not help: how much does it cost to > drain the queue? > > The bug, IMHO is that we *should* know. And in future I'd like to fix that, > either by adding an VIRTIO_BLK_F_ORDERED feature, or a VIRTIO_BLK_F_UNORDERED > feature. > > > Action plan for QEMU: > > > > - IDE needs to set the write cache enabled bit > > - virtio needs to implement a cache flush command and advertise it > > (also needs a small change to the host driver) > > So, virtio-blk needs to be enhanced for this as well. Really, enabling volatile write caches without advertising a cache flush command is a bug in the storage, where in our case qemu is the storage. So I don't really see the need for two feature bits. Here's my plan for virtio-blk: - add a new VIRTIO_BLK_F_WCACHE feature. If this feature is set we do (a) implement the prepare_flush queue operation to send a standalone cache flush (b) set a proper barrier ordering flag on the queue Now I'm not entirely sure which queue ordering feature we will use. It is not going to be QUEUE_ORDERED_TAG as for VIRTIO_BLK_F_BARRIER as that leaves all the queue draining to the host. Which for everything that uses something resembling Posix I/O as a backed and has more than one outstanding command at a time just means duplicating all the queue management we already do in the guest for no gain. The easiest one would be QUEUE_ORDERED_DRAIN_FLUSH, in which case the cache flush command really is everything we need. As a slight optimization of it we could make it QUEUE_ORDERED_DRAIN_FUA which still does all the queue draining in the guest, but only sends one explicit cache flush before the barrier and gthen sets the FUA bit on the actual barrier request. In qemu we still would implement this as fdatasync before and after the request, but we would save one protocol roundtrip. Now the big question is when do we set the VIRTIO_BLK_F_WCACHE feature. The proper thing to do would be to set it for cache=writeback and cache=none, because they do need the fdatasync, and not for cache=writethrough because it does not require it. Now Avi is a big advocate for the cache=writethrough should mean go fast and loose and don't care about data integrity. There's a certain point to that as I don't really see a good use case for that mode, but I really hate to make something unsafe that doesn't explicitly say so in the option name. The complex (not to say over engineered) verison would be to split the caching and data integrity setting into two options: (1) hostcache=on|off use buffered vs O_DIRECT I/O (2) integrity=osync|fsync|none use O_SYNC, use f(data)sync or do not care about data integrity -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html