On Fri, Jul 22, 2011 at 8:06 AM, Stefan Hajnoczi <stefanha@xxxxxxxxx> wrote: > On Thu, Jul 21, 2011 at 8:42 PM, Blue Swirl <blauwirbel@xxxxxxxxx> wrote: >> On Thu, Jul 21, 2011 at 6:01 PM, Stefan Hajnoczi <stefanha@xxxxxxxxx> wrote: >>> On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake <eblake@xxxxxxxxxx> wrote: >>>> Thank you for persisting - you've found another hole that needs to be >>>> plugged. It sounds like you are proposing that after a qemu process dies, >>>> that libvirt re-reads the qcow2 metadata headers, and validates that the >>>> backing file information has not changed in a manner unexpected by libvirt. >>>> If it has, then the qemu process that just died was compromised to the >>>> point that restarting a new qemu process from the old image is now a >>>> security risk. So this is _yet another_ security aspect that needs to be >>>> coded into libvirt as part of hardening sVirt. >>> >>> The backing file information changes when image streaming completes. >>> >>> Before: fedora.img <- my_vm.qed >>> After: my_vm.qed (fedora.img is no longer referenced) >>> >>> The image streaming operation copies data out of fedora.img and >>> populates my_vm.qed. When image streaming completes, the backing file >>> is no longer needed and my_vm.qed is updated to drop the backing file. >>> >>> I think we need to design carefully to prevent QEMU and libvirt making >>> incorrect assumptions about who does what. I really wish that all >>> this image file business was outside QEMU and libvirt - that we had a >>> separate storage management service which handled the details. QEMU >>> would only do block device operations (no image format manipulation), >>> and libvirt would only delegate to the storage management service. >>> Today we seem to be sprinkling a little bit of storage management into >>> QEMU and a little bit into libvirt :(. >>> >>> In that spirit it is much nicer to think of storage like a SAN >>> appliance where you have LUNs that you access as block devices. It >>> also provides an API for snapshotting, cloning LUNs, etc. >>> >>> Let's move to that model instead of worrying about how to spread >>> storage logic across QEMU and libvirt. >> >> Would NBD protocol fit to this purpose, or is it too simple? Then >> libvirt would handle the storage format completely and present an NBD >> interface to QEMU (or give an fd to an external service) and QEMU >> would not care about the storage format in this mode at all. > > NBD does not support flush (fdatasync). Therefore it only supports > the slow cache=writethrough mode in a safe manner. Maybe NBD could still be used in networked setups as a secondary alternative. > It would be neat to use virtio-blk as the interface because it can be > passed through to the guest. The guest talks directly to the storage > management service without going through QEMU. The trick is to do > something like vhost: > 1. An ioeventfd for virtqueue (guest->host) kicks > 2. An irqfd for host->guest kicks > 3. Shared memory for vring and zero-copy data access > > The storage management service provides a UNIX domain socket over > which fds can be passed to set up the vhost-like virtio-blk interface. > > Moving the image format code into a separate program makes it possible > to safely write to a backing file while VMs are using it because the > storage service can be host-wide, not per-VM. For example, streaming > a shared backing file over NFS while running VMs using copy-on-write > images. If we ever want to do deduplication or other global > operations, then this approach is nice too. > > To summarize: > The storage service manages image files including creation, deletion, > snapshotting, and actual I/O. QEMU uses a vhost-like virtio-blk > interface and can pass it directly into the guest. libvirt uses the > storage service API without needing to parse image files or keep track > of backing file relationships. Excellent plan. If one day kernel provides builtin virtio-blk services which can be passed via libvirt and QEMU to the guest, we'll even have zero copy all the way. -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list