Re: [PATCH v2 2/2] qemuProcessStop: Remove image metadata for running mirror jobs

Peter Krempa <pkrempa@xxxxxxxxxx> · Wed, 20 Nov 2019 15:22:58 +0100

On Wed, Nov 20, 2019 at 15:10:17 +0100, Michal Privoznik wrote:
> On 11/20/19 12:17 PM, Peter Krempa wrote:
> > On Tue, Nov 19, 2019 at 14:15:12 +0100, Michal Privoznik wrote:
> > > If user starts a blockcommit without --pivot or a blockcopy also
> > > without --pivot then we modify access for qemu on both images and
> > > leave it like that until pivot is executed. So far so good.
> > > Problem is, if user instead of issuing pivot calls destroy on the
> > > domain or if qemu dies whilst executing the block job. In this
> > > case we don't ever clear the access we granted at the beginning.
> > > To fix this, we need to mimic what block job code does for
> > > aborting a block job -> remove image metadata from disk->mirror
> > > and disk->src chains.
> > > 
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1741456#c19
> > > 
> > > Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx>
> > > ---
> > > 
> > > Diff to v2:
> > > - updated commit message
> > > - do more remove - for disk->src chain too
> > > - put a comment about the importance of code placement
> > > 
> > >   src/qemu/qemu_process.c | 12 ++++++++++++
> > >   1 file changed, 12 insertions(+)
> > > 
> > > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> > > index 209d07cfe8..b9dd433f54 100644
> > > --- a/src/qemu/qemu_process.c
> > > +++ b/src/qemu/qemu_process.c
> > > @@ -7615,6 +7615,18 @@ void qemuProcessStop(virQEMUDriverPtr driver,
> > >       for (i = 0; i < vm->def->niothreadids; i++)
> > >           vm->def->iothreadids[i]->thread_id = 0;
> > > +    /* Do this explicitly after vm->pid is reset so that security drivers don't
> > > +     * try to enter the domain's namespace which is non-existent by now as qemu
> > > +     * is no longer running. */
> > > +    for (i = 0; i < def->ndisks; i++) {
> > > +        virDomainDiskDefPtr disk = def->disks[i];
> > > +
> > > +        if (disk->mirror)
> > > +            qemuBlockRemoveImageMetadata(driver, vm, disk->dst, disk->mirror);
> > > +
> > > +        qemuBlockRemoveImageMetadata(driver, vm, disk->dst, disk->src);
> > > +    }
> > 
> > So aren't we tracking the image metadata also for the shared subsets of
> > backing chain which might be used by multiple domains?
> 
> No. We are not doing that:
> 
> static int
> virSecurityDACSetImageLabelInternal(virSecurityManagerPtr mgr,
>                                     virDomainDefPtr def,
>                                     virStorageSourcePtr src,
>                                     virStorageSourcePtr parent)
> {
>     bool remember;
>     bool is_toplevel = parent == src || parent->externalDataStore == src;
> 
>     ...
> 
>     /* We can't do restore on shared resources safely. Not even
>      * with refcounting implemented in XATTRs because if there
>      * was a domain running with the feature turned off the
>      * refcounter in XATTRs would not reflect the actual number
>      * of times the resource is in use and thus the last restore
>      * on the resource (which actually restores the original
>      * owner) might cut off access to the domain with the feature
>      * disabled.
>      * For disks, a shared resource is the whole backing chain
>      * but the top layer, or read only image, or disk explicitly
>      * marked as shared.
>      */
>     remember = is_toplevel && !src->readonly && !src->shared;
> 
>     return virSecurityDACSetOwnership(mgr, src, NULL, user, group,
> remember);
> }
> 
> 
> 
> static int
> virSecuritySELinuxSetImageLabelInternal(virSecurityManagerPtr mgr,
>                                         virDomainDefPtr def,
>                                         virStorageSourcePtr src,
>                                         virStorageSourcePtr parent)
> {
>     bool remember;
>     bool is_toplevel = parent == src || parent->externalDataStore == src;
>     int ret;
> 
>     ...
> 
>     /* We can't do restore on shared resources safely. Not even
>      * with refcounting implemented in XATTRs because if there
>      * was a domain running with the feature turned off the
>      * refcounter in XATTRs would not reflect the actual number
>      * of times the resource is in use and thus the last restore
>      * on the resource (which actually restores the original
>      * owner) might cut off access to the domain with the feature
>      * disabled.
>      * For disks, a shared resource is the whole backing chain
>      * but the top layer, or read only image, or disk explicitly
>      * marked as shared.
>      */
>     remember = is_toplevel && !src->readonly && !src->shared;
> 
>     ..
> 
>     ret = virSecuritySELinuxSetFilecon(mgr, src->path, use_label, remember);
> 
>     ...
> }
> 
> 
> > Because this
> > undoes everything and we do have some code which does this in the
> > security driver, don't we? So this either is duplicate or it's a
> > superset of what is done in the security driver.
> > 
> 
> So the problem here is that we call qemuDomainStorageSourceAccessAllow() but

Well, I was worried that we actually call the code even if
qemuDomainStorageSourceAccessAllow was never called on the image. For
example when no blockjob was ever used.

> never call the counter parts. Now, I could call
> qemuDomainStorageSourceAccessRevoke() here but:

This would be wrong and could actually cut off access for other VMs in
case when the image is shared. In case when we run a blockjob we relabel
it for write acces but by starting the job the user declares that the
chain is not shared in the first place as executing any merge would
corrupt other readers.

In addition now qemu image locks and our metadata locks should prevent
from getting there in the first place, but both of those have some
limitations.

> 1) we don't need to bother with CGroups or image locking (as in
> virtlockd/sanlock) because qemu process is gone at this point and so is its
> CGroup and image locks (both virtlockd and sanlock automatically release
> image locks if the process holding locks dies),
> 
> 2) I'd need to access top_parent and baseSource - are they available here
> easily?
> 
> 3) we are doing this code I'm suggesting already in
> qemuBlockJobEventProcessLegacyCompleted()

As mentioned above, the used declared that the chain is not shared by
calling the API in the first place but calling it all the time is ...
suspicious at best.

It feels like the labelling code should handle this if it should be
executed all the time during vm shutdown. Since the functions you've
pasted above say that it's not the correct thing to do this also must
be wrong.

If the only correct thing to do is to nuke the xattrs it still feels
like the labelling code which put the xattrs there in the first place
should deal with it when un-labelling the files and not deal with it
explicitly in many places.

I can understand that the old block job handlers require this special
handling but for the new blockjob handler we unlabel the files as they
are being removed so even that place would be fixed with a systemic
solution.

> 
> Also note that I'm calling qemuBlockRemoveImageMetadata() only after
> qemuSecurityRestoreAllLabel().
> 
> Michal
Attachment:
signature.asc

Description: PGP signature
--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list