On 03/12/2014 02:42 PM, Peter Krempa wrote: >> A backing chain of 3 files (base <- mid <- top) in the local file system: >> >> <disk type='file' device='disk'> >> <driver name='qemu' type='qcow2'/> >> <source file='/var/lib/libvirt/images/top.qcow2'/> >> <backingStore type='file'> >> <driver name='qemu' type='qcow2'/> > > ... we should add an attribute with the index of the backing chain > element in the backing chain. Hmm. Another feature coming down the pipes in qemu 2.0 is the ability to give an alias to any portion of the backing chain. Right now, we have an <alias> element tied to the <disk> as a whole (in qemu parlance, the device id), but some qemu operations will be easier if we also have a name tied to each file in the chain (in qemu parlance, a node id for the bd [block driver structure]). Maybe we kill two birds with one stone, by having each <backingStore> track an <alias> sub-element with the name of the node, when communicating with qemu 2.0 and newer. For a specific instance, consider a quorum vs. a snapshot create action - there are two approaches: create a single qcow2 whose backing file is the quorum (that is, request the snapshot on the node tied to the quorum): Q[a, b, c] <- snap or create a new quorum of three qcow2 files, with each qcow2 file wrapping a member of the old quorum (actually, a 'transaction' command that creates three files in one go): Q[a <- snapA, b <- snapB, c <- snapC] or even anything in between (request a snapshot of the node tied to A, while leaving b and c alone, since node A is on the storage most amenable to copying off the snapshot for backup purposes while node B and C are remote). The way qemu is exposing this is by specifying that when creating the new node for the snapshot, whether its backing file is the node id of the overall quorum or of a node id of one of the pieces of the quorum. So while the overall <disk> alias remains constant, the quorum node is different from any of its three backing files. It's further evidence that the quorum itself does not use any file resources, but instead relies on multiple backingStores, and taking the snapshot (or snapshots) needs control over all possible nodes as the starting point that will be gaining a new qcow2 node as part of the snapshot creation. Right now, <alias> is currently a run-time and output-only parameter, but we someday want to support offline block-pull and friends, where we'd need the index to exist even when <alias> does not. Likewise, while each <backingStore> corresponds to a qemu node, and can thus have one name, the top-level <disk> has the chance for BOTH a device alias (which moves around whenever the active image changes due to snapshots, block copy, or block commit operations) and a node index (which is tied to the file name, even if the file changes to no longer being the active image in the chain). Thanks for making me think about that! Code-wise, I'm looking at splitting 'struct _virDomainDiskDef' into two parts. The outermost part is _virDomainDiskDef, which tracks anything tied to the guest view, or to the device as a whole (<target>, <alias>, <address>); the inner part is a new _virDomainDiskSrcDef, which tracks anything related to a host view (node name, <source>, <driver>, <backingStore>), and where each backingStore is also a _virDomainDiskSrcDef, as a recursive structure - we just special case the output so that the first _virDomainDiskSrcDef feeds the XML of <disk> element, while all other _virDomainDiskSrcDef feed the XML of a <backingStore>. For tracking node ids, I would then add a counter of nodes created so far to the outer structure, (more important for an online domain, as we want to track node names that mesh with qemu node names, and must not reuse names no matter how many snapshots or block-commits happen in between), and where each inner structure grabs the next increment of the counter. So revisiting various operations: On snapshot, we are going from: domainDiskDef (counter 1, alias "ide0-0-0") + domainDiskSrcDef (node "ide0-0-0[0]", source "base") to: domainDiskDef (counter 2, alias "ide0-0-0") + domainDiskSrcDef (node "ide0-0-0[1]", source "snap") + domainDiskSrcDef (node "ide0-0-0[0]", source "base") Note that the node names grow in order of creation, which is NOT the same as a top-down breadth-first numbering. <alias> and nodeid would be output only (ignored on input); as long as qemu is running we cannot reuse old nodeids, but when qemu is offline, we could rename things to start back from 0; maybe only when passed a specific flag (similar to the update cpu flag forcing us to update portions of the xml that we otherwise leave unchanged). Do we need both a node id and a <backingStore> index? We already allow disk operations by <alias> name; so referring to the node id may be sufficient. On the other hand, having index as an attribute might make it easier to write XPath queries that resolve to a numbered node regardless of depth (I'm a bit weak on XPath, but there's bound to be a way to lookup a <disk> element whose target is named "vda" and that has a "backingStore[index=4]" sub-element). So, for a theoretical quorum with 2/3 majority and where one of the disks is a backing chain, as in Q[a, b <- c, d], and where qemu is running, it might look like: <disk type='quorum' device='disk'> <driver name='qemu' type='quorum' threshold='2' node='[4]'/> <backingStore type='file' index='1'> <driver name='qemu' type='raw' node='[0]'/> <source path='/path/to/a'/> <backingStore/> </backingStore> <backingStore type='file' index='2'> <driver name='qemu' type='qcow2' node='[2]'/> <source path='/path/to/c'/> <backingStore type='file' index='3' node='[1]'> <driver name='qemu' type='raw'/> <source path='/path/to/b'/> <backingStore/> </backingStore> </backingStore> <backingStore type='file' index='4'> <driver name='qemu' type='raw' node='[3]'/> <source path='/path/to/d'/> <backingStore/> </backingStore> <target dev='hda' bus='ide'/> <alias name='ide0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> then the node names that qemu uses will be the concatenation of the <disk> alias and each DiskSrcDef node ("ide0-0-0[4]" is the quorum, "ide0-0-0[0]" is the node for file A, ...), and where you can also refer to backing stores by index ("vda" or "vda[0]" is the quorum, "vda[1]" is file A from the quorum, "vda[2]" is the active part of the chain from the second member of the quorum, ...) -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list