On 07.09.2010, at 15:41, Anthony Liguori wrote: > Hi, > > We've got copy-on-read and image streaming working in QED and before going much further, I wanted to bounce some interfaces off of the libvirt folks to make sure our final interface makes sense. > > Here's the basic idea: > > Today, you can create images based on base images that are copy on write. With QED, we also support copy on read which forces a copy from the backing image on read requests and write requests. > > In additional to copy on read, we introduce a notion of streaming a block device which means that we search for an unallocated region of the leaf image and force a copy-on-read operation. > > The combination of copy-on-read and streaming means that you can start a guest based on slow storage (like over the network) and bring in blocks on demand while also having a deterministic mechanism to complete the transfer. > > The interface for copy-on-read is just an option within qemu-img create. Streaming, on the other hand, requires a bit more thought. Today, I have a monitor command that does the following: > > stream <device> <sector offset> > > Which will try to stream the minimal amount of data for a single I/O operation and then return how many sectors were successfully streamed. > > The idea about how to drive this interface is a loop like: > > offset = 0; > while offset < image_size: > wait_for_idle_time() > count = stream(device, offset) > offset += count > > Obviously, the "wait_for_idle_time()" requires wide system awareness. The thing I'm not sure about is 1) would libvirt want to expose a similar stream interface and let management software determine idle time 2) attempt to detect idle time on it's own and provide a higher level interface. If (2), the question then becomes whether we should try to do this within qemu and provide libvirt a higher level interface. I'm torn here too. Why not expose both? Have a qemu internal daemon available that gets a sleep time as parameter and an external "pull sectors" command. We'll see which one is more useful, but I don't think it's too much code to justify only having one of the two. And the internal daemon could be started using a command line parameter, which helps non-managed users. > > A related topic is block migration. Today we support pre-copy migration which means we transfer the block device and then do a live migration. Another approach is to do a live migration, and on the source, run a block server using image streaming on the destination to move the device. > > With QED, to implement this one would: > > 1) launch qemu-nbd on the source while the guest is running > 2) create a qed file on the destination with copy-on-read enabled and a backing file using nbd: to point to the source qemu-nbd > 3) run qemu -incoming on the destination with the qed file > 4) execute the migration > 5) when migration completes, begin streaming on the destination to complete the copy > 6) when the streaming is complete, shut down the qemu-nbd instance on the source > > This is a bit involved and we could potentially automate some of this in qemu by launching qemu-nbd and providing commands to do some of this. Again though, I think the question is what type of interfaces would libvirt prefer? Low level interfaces + recipes on how to do high level things or higher level interfaces? Is there anything keeping us from making the QMP socket multiplexable? I was thinking of something like: { command = "nbd_server" ; block = "qemu_block_name" } { result = "done" } <qmp socket turns into nbd socket> This way we don't require yet another port, don't have to care about conflicts and get internal qemu block names for free. Alex -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list