On Thu, Jul 21, 2011 at 8:42 PM, Blue Swirl <blauwirbel@xxxxxxxxx> wrote: > On Thu, Jul 21, 2011 at 6:01 PM, Stefan Hajnoczi <stefanha@xxxxxxxxx> wrote: >> On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake <eblake@xxxxxxxxxx> wrote: >>> Thank you for persisting - you've found another hole that needs to be >>> plugged. It sounds like you are proposing that after a qemu process dies, >>> that libvirt re-reads the qcow2 metadata headers, and validates that the >>> backing file information has not changed in a manner unexpected by libvirt. >>> If it has, then the qemu process that just died was compromised to the >>> point that restarting a new qemu process from the old image is now a >>> security risk. So this is _yet another_ security aspect that needs to be >>> coded into libvirt as part of hardening sVirt. >> >> The backing file information changes when image streaming completes. >> >> Before: fedora.img <- my_vm.qed >> After: my_vm.qed (fedora.img is no longer referenced) >> >> The image streaming operation copies data out of fedora.img and >> populates my_vm.qed. When image streaming completes, the backing file >> is no longer needed and my_vm.qed is updated to drop the backing file. >> >> I think we need to design carefully to prevent QEMU and libvirt making >> incorrect assumptions about who does what. I really wish that all >> this image file business was outside QEMU and libvirt - that we had a >> separate storage management service which handled the details. QEMU >> would only do block device operations (no image format manipulation), >> and libvirt would only delegate to the storage management service. >> Today we seem to be sprinkling a little bit of storage management into >> QEMU and a little bit into libvirt :(. >> >> In that spirit it is much nicer to think of storage like a SAN >> appliance where you have LUNs that you access as block devices. It >> also provides an API for snapshotting, cloning LUNs, etc. >> >> Let's move to that model instead of worrying about how to spread >> storage logic across QEMU and libvirt. > > Would NBD protocol fit to this purpose, or is it too simple? Then > libvirt would handle the storage format completely and present an NBD > interface to QEMU (or give an fd to an external service) and QEMU > would not care about the storage format in this mode at all. NBD does not support flush (fdatasync). Therefore it only supports the slow cache=writethrough mode in a safe manner. It would be neat to use virtio-blk as the interface because it can be passed through to the guest. The guest talks directly to the storage management service without going through QEMU. The trick is to do something like vhost: 1. An ioeventfd for virtqueue (guest->host) kicks 2. An irqfd for host->guest kicks 3. Shared memory for vring and zero-copy data access The storage management service provides a UNIX domain socket over which fds can be passed to set up the vhost-like virtio-blk interface. Moving the image format code into a separate program makes it possible to safely write to a backing file while VMs are using it because the storage service can be host-wide, not per-VM. For example, streaming a shared backing file over NFS while running VMs using copy-on-write images. If we ever want to do deduplication or other global operations, then this approach is nice too. To summarize: The storage service manages image files including creation, deletion, snapshotting, and actual I/O. QEMU uses a vhost-like virtio-blk interface and can pass it directly into the guest. libvirt uses the storage service API without needing to parse image files or keep track of backing file relationships. Stefan -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list