On 12/02/2015 02:08 AM, Simon Kollberg wrote: > Hi! Apologies for not noticing this mail sooner. > > I'm working on supporting a new FT/HA solution for qemu called COLO > (http://wiki.qemu.org/Features/COLO). The part that is currently being > focused > on for libvirt integration is Block Replication > (http://wiki.qemu.org/Features/BlockReplication) which enables guest state > synchronization for disks. Here's some rough thoughts on the matter, although we may go through several iterations before landing on something that everyone likes. > > Right now there are three issues that I'd like to get your input on: > > 1. > As you can see on the block replication wiki-page we need to reference the > secondary disk id. > > Example from the wiki: > -drive if=none,driver=raw,file.filename=1.raw,id=colo1 \ > -drive if=xxx,driver=replication,mode=secondary,\ > ... > file.backing.backing=colo1 > > My initial thought was to manually set the alias of the > disk and add a new reference element to the backingStore: > <disk type='file' device='disk'> > ... > <alias name='colo1'/> > </disk> > <disk type='file' device='disk'> > ... > <backingStore type='file'> > ... > <reference name='colo1'/> > </backingStore> > </disk> > > Though, I quickly realized that setting the alias is only done by the > hypervisor and is therefore not an option with the current code. > > Would it be bad letting the user set the alias, and if so, do you have any > ideas of how to solve the referencing? I'm a little bit leery of letting the user set the alias; one benefit we've had of NOT letting the user control it is that we could avoid name collisions. It's not a strong enough reason to reject the idea, but certainly worth thinking about. Another consideration, if you do 'virsh dumpxml' on a running domain, the live xml contains alias names; you can then 'virsh define' that xml, and the aliases will be silently dropped. This is in fact useful, if we have to change the alias name we generate under the hood when first starting a domain under a newer version of qemu. If the user can set the alias, we are stuck with that name. On the other hand, as long as we have an alias name and use it consistently, we can just document that the user can't cause conflicts, making the name persistent may rather easy. On the other hand, we DO want to make the index='1' of <backingStore> something that becomes persistent. And the <target dev='...'> attribute coupled with the <backingStore index='...'> is sufficiently unique to reference ANY element of the backing chain. That is, I would lean towards something more like this: <disk type='file' device='disk'> ... <source file='...' index='0'/> <backingStore/> <target dev='vda' bus='virtio'/> </disk> <disk type='file' device='disk'> ... <backingStore type='replication'> ... <reference dev='vda' index='0'/> </backingStore> </disk> A couple of things to note there: I think a new type='replication' (rather than reusing existing type='file') will make it obvious that we are adding new XML specifically for block replication; then in that new type, we can add a new <reference> that refers to dev='vda' and index='0' (we'll have to start exposing an index for the active layer, not just the backingStore layers), as what the device will be replicating. > 2. > The format of the disk and the driver type currently shares the same > attribute in libvirt (the type attribute on driver XML element). However, > with > the new replication disk driver you need to be able to set both the disk > format > and also the driver name. > > Example from the wiki: > -drive if=xxx,driver=replication,mode=secondary,\ > file.file.filename=active_disk.qcow2,\ > file.driver=qcow2,\ So we are basically stacking TWO drivers on top of a single file. I think that means we'll want two layers of XML, something like: <disk type='replication'> <backingStore type='file'> <driver name='qemu' type='qcow2'> <source file='/path/to/active_disk.qcow2'/> </backingStore> </disk> Again, anywhere we have two layers of protocol in qemu to get to the underlying file, it makes sense to have two layers of XML in libvirt. We'll want the same sort of type='quorum' as a new disk type for handling quorum drives, where those 0 direct <source> elements but instead have multiple <backingStore> child elements. Ideally, since everything can be represented as a BDS tree in qemu, it should also be represented as a similar tree in XML in libvirt, except that libvirt has already taken the shortcut that a single protocol and file layer can be combined (that is, we show qcow2 images and source files in the same layer), due to historical usage. > ... > > I saw that there was a function in libvirt called virStorageFileProbeFormat > that could let us get the format of the disk without stating it in the XML. > But > as I'm sure you know, it's strongly advised not to be used since you can > trick > the function by modifying the disk file. Correct, any solution that requires probing rather than explicit format will not fly. > > > 3. > When using the replication driver the secondary disk is supposed to be added > but not attached. > Example from the wiki: > -drive if=none,driver=raw,file.filename=1.raw,id=colo1 \ > -drive if=xxx,driver=replication,mode=secondary,\ > ... > > Clearly, trying to setup a disk without a target is not allowed at the > moment. > Is there any better way of doing it? Hmm. I'm almost wondering if <disk> is the wrong element. Most of the XML is trying to describe something the guest will see, but if we are creating a replication driver that is NOT visible to the guest, that almost argues that we should create an entirely new sibling element next to <disk>. The new element would not need a <target> (because it is not guest visible), but would otherwise be similar to <disk>. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list