On Thu, Apr 07, 2011 at 04:31:58PM -0500, Adam Litke wrote: > I've been working with Anthony Liguori and Stefan Hajnoczi to enable data > streaming to copy-on-read disk images in qemu. This work is working its way > through review and I expect it to be upstream soon as part of the support for > the new QED disk image format. > > Disk streaming is extremely useful when provisioning domains from a central > repository of template images. Currently the domain must be provisioned by > either: 1) copying the template image to local storage before the VM can be > started or, 2) creating a qcow2 image that backs to a base image in the remote > repository. Option 1 can introduce a significant delay when provisioning large > disks. Option 2 introduces a permanent dependency on a remote service and > increased network load to satisfy disk reads. So the scenario we have is a thin-provisioned disk image, with a backstore of some kind (whether local image, or a NBD server doesn't matter). The goal is to allocate blocks in the disk image, to change it from being thin-provisioned, to less-thin, or even fully-allocated. QEMU may be running while this is done (requiring online copy by QEMU process via the monitor) or shutoff (requiring offline copy with qemu-img commands). What strikes me, is that from an API design POV, there is really no compelling reason to restrict this to disk images with backing stores. Any disk volume which is thin-provisioned can benefit from this. ie, instead of copying blocks of data from the backing store, just write blocks of zeros into unallocated regions of the disk. So a mgmt app can start a VM with a sparse raw file, with host storage overcommit across all VMs, and if they later need to provide a strong guarantee for storage allocatio to a particular VM, this API can used, regardless of whether a backingstore is present. > Qemu will support two streaming modes: full device and single sector. Full > device streaming is the easiest to use because one command will cause the whole > device to be streamed as fast as possible. Single sector mode can be used if > one wants to throttle streaming to reduce I/O pressure. In this mode, a > management tool issues individual commands to stream single sectors. This design is needlessly restrictive IMHO - special casing the two extremes, and not providing any intermediate capabilities. The API should just take an offset and a length. This trivially allows for a single sector, multiple sectors, or all sectors. The API should also be using bytes, not sectors. Sectors are a very ill-defined unit of measurement, with lots of potential meanings. It could be the sector size of the underlying block device, filesystem block size, the cluster size of the virtual disk file format, or sector size of the virtual block device. Using bytes, specifying the logical offset + length of the virtual disk image is clear. In addition, all the other libvirt storage APIs use bytes, and we want this to be consistent with them. If the internal implementation wants to convert from bytes to sectors & round up/down to nearest sector boundary, then that is fine - just don't expose it in the API. Finally, while requesting allocation of the entire disk is pretty trivial, to be able to sensibly do allocation of partial regions or individual sectors, applications need to be able to find out just what regions are currently allocated/missing. This will require some kind of API to query disk allocation regions (cf the FIEMAP/FIBMAP ioctls). > To enable this support in libvirt, I propose the following API... > > virDomainStreamDisk() will start or stop a full device stream or stream a > single sector of a device. The behavior is controlled by setting > virDomainStreamDiskFlags. When either starting or stopping a full device > stream, the return value is either 0 or -1 to indicate whether the operation > succeeded. For a single sector stream, a device offset is returned (or -1 on > failure). This value can be used to continue streaming with a subsequent call > to virDomainStreamDisk(). > > virDomainStreamDiskInfo() returns information about active full device streams > (the device alias, current streaming position, and total size). I'm finding the term 'Streaming' to be quite mis-leading. This is really about allocating blocks in the disk image. Thus I would use the word 'Allocate' in the API naming. I'll followup about API design in the next patch. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list