On Tue, Sep 7, 2010 at 3:34 PM, Kevin Wolf <kwolf@xxxxxxxxxx> wrote: > Am 07.09.2010 15:41, schrieb Anthony Liguori: >> Hi, >> >> We've got copy-on-read and image streaming working in QED and before >> going much further, I wanted to bounce some interfaces off of the >> libvirt folks to make sure our final interface makes sense. >> >> Here's the basic idea: >> >> Today, you can create images based on base images that are copy on >> write. With QED, we also support copy on read which forces a copy from >> the backing image on read requests and write requests. >> >> In additional to copy on read, we introduce a notion of streaming a >> block device which means that we search for an unallocated region of the >> leaf image and force a copy-on-read operation. >> >> The combination of copy-on-read and streaming means that you can start a >> guest based on slow storage (like over the network) and bring in blocks >> on demand while also having a deterministic mechanism to complete the >> transfer. >> >> The interface for copy-on-read is just an option within qemu-img >> create. > > Shouldn't it be a runtime option? You can use the very same image with > copy-on-read or copy-on-write and it will behave the same (execpt for > performance), so it's not an inherent feature of the image file. > > Doing it this way has the additional advantage that you need no image > format support for this, so we could implement copy-on-read for other > formats, too. I agree that streaming should be generic, like block migration. The trivial generic implementation is: void bdrv_stream(BlockDriverState* bs) { for (sector = 0; sector < bdrv_getlength(bs); sector += n) { if (!bdrv_is_allocated(bs, sector, &n)) { bdrv_read(bs, sector, ...); bdrv_write(bs, sector, ...); } } } > >> Streaming, on the other hand, requires a bit more thought. >> Today, I have a monitor command that does the following: >> >> stream <device> <sector offset> >> >> Which will try to stream the minimal amount of data for a single I/O >> operation and then return how many sectors were successfully streamed. >> >> The idea about how to drive this interface is a loop like: >> >> offset = 0; >> while offset < image_size: >> wait_for_idle_time() >> count = stream(device, offset) >> offset += count >> >> Obviously, the "wait_for_idle_time()" requires wide system awareness. >> The thing I'm not sure about is 1) would libvirt want to expose a >> similar stream interface and let management software determine idle time >> 2) attempt to detect idle time on it's own and provide a higher level >> interface. If (2), the question then becomes whether we should try to >> do this within qemu and provide libvirt a higher level interface. > > I think libvirt shouldn't have to care about sector offsets. You should > just tell qemu to fetch the image and it should do so. We could have > something like -drive backing_mode=[cow|cor|stream]. > >> A related topic is block migration. Today we support pre-copy migration >> which means we transfer the block device and then do a live migration. >> Another approach is to do a live migration, and on the source, run a >> block server using image streaming on the destination to move the device. >> >> With QED, to implement this one would: >> >> 1) launch qemu-nbd on the source while the guest is running >> 2) create a qed file on the destination with copy-on-read enabled and a >> backing file using nbd: to point to the source qemu-nbd >> 3) run qemu -incoming on the destination with the qed file >> 4) execute the migration >> 5) when migration completes, begin streaming on the destination to >> complete the copy >> 6) when the streaming is complete, shut down the qemu-nbd instance on >> the source > > Hm, that's an interesting idea. :-) > > Kevin > > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list