[warning - long email ahead. tl;dr: this patch is not ready yet, because I found a problem in the design] On 06/12/2014 09:02 AM, Peter Krempa wrote: > Until now we were changing information about the disk source via > multiple steps of copying data. Now that we changed to a pointer to > store the disk source we might use it to change the approach to track > the data. > > Additionally this will allow proper tracking of the backing chain. > --- > src/qemu/qemu_driver.c | 108 +++++++++---------------------------------------- > 1 file changed, 19 insertions(+), 89 deletions(-) It's also a nice diffstat :) > @@ -12863,26 +12859,12 @@ qemuDomainSnapshotCreateSingleDiskActive(virQEMUDriverPtr driver, > if (virAsprintf(&device, "drive-%s", disk->info.alias) < 0) > goto cleanup; > > - /* XXX Here, we know we are about to alter disk->src->backingStore if > - * successful, so we nuke the existing chain so that future commands will > - * recompute it. Better would be storing the chain ourselves rather than > - * reprobing, but this requires modifying domain_conf and our XML to fully > - * track the chain across libvirtd restarts. */ > - virStorageSourceClearBackingStore(disk->src); > - Yay - moving the code in the direction that the comments hinted we should be moving. > if (virStorageFileInit(snap->src) < 0) > goto cleanup; > > if (qemuGetDriveSourceString(snap->src, NULL, &source) < 0) > goto cleanup; > > - if (VIR_STRDUP(newsource, snap->src->path) < 0) > - goto cleanup; > - > - if (persistDisk && > - VIR_STRDUP(persistSource, snap->src->path) < 0) > - goto cleanup; One thing this code was trying to do is hoist all allocation up front, so that if we hit OOM, we have not made any permanent changes and can therefore safely roll back. > @@ -12969,33 +12942,18 @@ qemuDomainSnapshotCreateSingleDiskActive(virQEMUDriverPtr driver, > /* Update vm in place to match changes. */ > need_unlink = false; > > - VIR_FREE(disk->src->path); > - virStorageNetHostDefFree(disk->src->nhosts, disk->src->hosts); > - > - disk->src->path = newsource; > - disk->src->format = format; > - disk->src->type = snap->src->type; > - disk->src->protocol = snap->src->protocol; > - disk->src->nhosts = snap->src->nhosts; > - disk->src->hosts = newhosts; > + if (!(tmp = virStorageSourceCopy(snap->src, false))) > + goto cleanup; But here, you've delayed the allocation until after we've already made some changes. You'll need to hoist the call to virStorageSourceCopy() to occur in the same place as the code it is replacing. > > - newsource = NULL; > - newhosts = NULL; > + tmp->backingStore = disk->src; > + disk->src = tmp; This line looks like it is in the correct location - we are merely manipulating the linked list in place to insert the newly-created wrapper. I do see a potential problem, though. Lots of graphics below, in three major demo sections: ================== In the _existing_ code, we are doing in-place modification of _portions_ of the storage source, then calling the security/audit/lease manager on the modified source. Why? Because the portions of the disk that we do NOT modify include the labeling information, and that's in part because historically we had multiple pieces of labeling information stored only once at the disk level. Graphically, our current behavior has a lifecycle something like this, when booting a domain with a chain of 2 and snapshotting it into a chain of 3: at parse time, we only learn the top level, and the fact that we need to generate multiple labels: disk: dev = vda src = +-------------------+ + path = mid + + rwlabel = NULL + + rolabel = NULL + + backing = NULL + + mid is unlabeled + +-------------------+ at domain start time, we then probe the whole chain, which learns more file names, but does not populate label information in the backing files; we also perform labeling: disk: dev = vda src = +-------------------+ +--------------------+ + path = mid + + path = base + + rwlabel = foo + + rwlabel = NULL + + rolabel = bar + + rolabel = NULL + + backing = ---------->+ backing = NULL + + mid has label foo + + base has label bar + +-------------------+ +--------------------+ Now the snapshot comes along, and we are adding a new layer: snapshot: dev = vda src = +-------------------+ + path = top + + rwlabel = NULL + + rolabel = NULL + + backing = NULL + + top is unlabeled + +-------------------+ so we label it - but with what label? The label was tied to 'mid'. Hence, the in-place swapping, so that we have: disk: dev = vda src = +-------------------+ + path = top + + rwlabel = foo + + rolabel = bar + + backing = NULL + + top has label foo + +-------------------+ long enough to give us a label. If successful, the code takes a shortcut at this point, and just reprobes the chain, finally leaving us with the desired chain (if unsuccessful, we undo the temporary changes by restoring src to mid, and backing to point to base). But note that the reprobe does NOT know what label backing elements actually have; what's more, mid is still labeled 'foo' for read-write privileges, even though after the snapshot we could have safely downgraded it to label 'bar' for read-only privileges: disk: dev = vda src = +-------------------+ +-------------------+ +--------------------+ + src = top + + src = mid + + src = base + + rwlabel = foo + + rwlabel = NULL + + rwlabel = NULL + + rolabel = bar + + rolabel = NULL + + rolabel = NULL + + backing = ---------->+ backing = ---------->+ backing = NULL + + top has label foo + + mid has label ?? + + base has label ?? + +-------------------+ +-------------------+ +--------------------+ =============== But with your code, merely inserting (a copy of) the snapshot source at the front of the chain would leave us with: disk: dev = vda src = +-------------------+ +-------------------+ +--------------------+ + src = top + + src = mid + + src = base + + rwlabel = NULL + + rwlabel = foo + + rwlabel = NULL + + rolabel = NULL + + rolabel = bar + + rolabel = NULL + + backing = ---------->+ backing = ---------->+ backing = NULL + + top has label foo + + mid has label foo + + base has label bar + +-------------------+ +-------------------+ +--------------------+ it has the benefit of remembering what we have previously labeled things, but has the absolute failure of forgetting what labels we have generated for the <disk> (since those labels are buried in the backing chain instead of the top element of the chain). For your code to work, you have to ALSO transfer the security information into the top of the chain, probably best done via a new helper function. =============== Here's what I envision that we should be doing instead. The existing code is tracking two different labels at the top level source, and none at any other layer. Instead, we should be tracking both read-write and read-only label templates at the <disk> level, then for any given virStorageSourcePtr, have a notion only of a current label. The security manager only needs access to the disk label templates (the generated read-write and read-only labels don't change over the life of the disk), as well as the current label per disk, along with the ability to swap that label (we basically have three label states: unlabeled, read-only, and read-write - and use the security manager to switch between those states). So the same lifecycle now looks more like: original parse: disk: dev = vda rwlabel = NULL rolabel = NULL src = +-------------------+ + path = mid + + backing = NULL + + mid is unlabeled + +-------------------+ probe the backing chain, and generate labels: disk: dev = vda rwlabel = foo rolabel = bar src = +-------------------+ +--------------------+ + path = mid + + path = base + + backing = ---------->+ backing = NULL + + mid has label foo + + base has label bar + +-------------------+ +--------------------+ Parse the snapshot, and pass THAT storage source pointer to the security code, but reusing the label template from the disk, so that we get the correct label in-place: snapshot: dev = vda src = +-------------------+ + path = top + + backing = NULL + + top has label foo + +-------------------+ Wire up the chain: disk: dev = vda rwlabel = foo rolabel = bar src = +-------------------+ +-------------------+ +--------------------+ + src = top + + src = mid + + src = base + + backing = ---------->+ backing = ---------->+ backing = NULL + + top has label foo + + mid has label foo + + base has label bar + +-------------------+ +-------------------+ +--------------------+ and finally revoke the read-write label on mid: disk: dev = vda rwlabel = foo rolabel = bar src = +-------------------+ +-------------------+ +--------------------+ + src = top + + src = mid + + src = base + + backing = ---------->+ backing = ---------->+ backing = NULL + + top has label foo + + mid has label bar + + base has label bar + +-------------------+ +-------------------+ +--------------------+ and if a read-only backing file is shared by more than one chain (or even across more than one <domain>), now you can see why reference counting becomes important, for knowing when it is safe to revoke read-write or even to forbid granting read-write. Furthermore, once we start allowing backing chain as user input instead of crawling the chain ourselves, the per-source label can be overridden for any element of the chain, instead of the current limitation of only having an override label at the top level. I don't think we are going to get to the idea setup of security management overnight, so in the meantime, we need some crutches to make sure that your code is not botching things. ================= So, with that explanation out of the way, I'm resuming the review. I think we can get away with your method in the short term (the 2nd of the 3 sections above) IF you also manage to transfer labeling information to the active layer of the chain, but this patch needs a respin because in its current state it does not do that. > > if (persistDisk) { > - VIR_FREE(persistDisk->src->path); > - virStorageNetHostDefFree(persistDisk->src->nhosts, > - persistDisk->src->hosts); > - > - persistDisk->src->path = persistSource; > - persistDisk->src->format = format; > - persistDisk->src->type = snap->src->type; > - persistDisk->src->protocol = snap->src->protocol; > - persistDisk->src->nhosts = snap->src->nhosts; > - persistDisk->src->hosts = persistHosts; > + if (!(tmp = virStorageSourceCopy(snap->src, false))) > + goto cleanup; Again, this is a late allocation; we risk OOM after making changes that can't be undone. I'd rather have two temporary virStorageSourcePtr objects allocated up front, and then plugged in at the end on success, rather than risk late allocation failure. > > @@ -13017,21 +12971,15 @@ qemuDomainSnapshotCreateSingleDiskActive(virQEMUDriverPtr driver, > static void > qemuDomainSnapshotUndoSingleDiskActive(virQEMUDriverPtr driver, > virDomainObjPtr vm, > - virDomainDiskDefPtr origdisk, > virDomainDiskDefPtr disk, > virDomainDiskDefPtr persistDisk, > bool need_unlink) Arggh - I ran out of time to review the rest. But hopefully this gives you some ideas to think about. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list