Re: RFC: New APIs for delegation of privileged operations

Martin Kletzander <mkletzan@xxxxxxxxxx> · Thu, 8 Dec 2022 10:12:22 +0100

On Wed, Dec 07, 2022 at 12:07:11PM +0000, Daniel P. Berrangé wrote:
On Wed, Dec 07, 2022 at 12:42:06PM +0100, Martin Kletzander wrote:
On Thu, Dec 01, 2022 at 10:17:49AM +0000, Daniel P. Berrangé wrote:
> The other end of the
>
>  virInternalSetProcessSetMaxMemLockHandler
>
> wouldn't have ability to validate the VM identity even if we
> passed it, since the master source of VM identity info is
> the unprivileged and untrusted component.
>
> This means it is a big challenge to do more than just a blanket
> allow/deny for the entire 'max mem lock' feature, rather than try
> to finese it per VM.
>

Exactly what I was afraid of with another approach I discussed with
someone else a while ago.  If you start providing ways to do arbitrary
privileged operations, then you are effectively giving away privileged
access.

In this case I think it was an unfortunate choice of an API.  If the API
is *designed* to provide the proper identifying information, then the
management application can then choose the action properly.

I think it is challenging no matter what because the privileged component
is placing trust on the unprivilged component to supply honest identifying
info.  This is a key reason why polkit ACL checks are done based on
process ID + permission name. Process ID can't be faked, and you're asking
the user to confirm the honesty of the permission name.

Overall, I think if you're going to allow "mem lock" to an unprivileged
VM that's fine, but the expectation should be that we're allowing this
for *any* VM, and not able to offer per-VM access control on that usage.

What I meant was something more along the lines of "place_vm_in_cgroup" where we
would offload the cgroup placement to kubevirt.  It already has to trust us with
the information we provide now (which threads are placed in which cgroups).  The
benefit of having the callback per-connection and calling it on the connection
that is starting the VM would kind of make the argument easier.

> > > The application-provided handler would be responsible for performing
> > > the privileged operation (in this case raising the memlock limit for
> > > a process). For KubeVirt, virt-launcher would have to pass the baton
> > > to virt-handler.
> > >
> > > If such an handler is installed, libvirt would invoke it (and likely
> > > go through some sanity checks afterwards); if not, it would attempt
> > > to perform the privileged operation itself, as it does today.
> >
> > I think we also need to carefully consider how to do the callbacks in
> > our RPC. Currently events don't wait for the client to respond and these
> > would need to do so.
> >
> > Also use of these should e.g. require keepalive to be used.
>
> What you're describing is no longer callbacks, but it is method
> calls operating in the reverse direction to what they do today.
> Furthermore this would tie up the RPC worker dispatch thread
> while waiting for a response. This could easily result in a
> deadlock situation if the client app servicing this request
> was busy doing something else, or had to wait on its own
> locks, but those locks were held by another thread that is
> waiting on libvirt. Then there's the complexity of tieing
> this up into thue actual RPC implementations of both cllient
> and server.
>

My idea how to avoid callback failures was to split
e.g. virDomainCreate() similarly to how virDomainMigrate...() is,
i.e. into phases which would be invisible from the caller unless they
want to do "something special" with the domain.  Of course this does not
work for actions initiated by guest/libvirt, but to start small this was
one of the ideas.

I think the experiance with migration would strongly bias me against
decomposing the startup process. We had no choice with migrate but
to decompse because we needed to co-ordinate across multiple hosts,
but it has been a horrible burden, requiring us to re-design the
internal migration steps many many times. I don't want us to get
into that trap for other APIs.

Fair point.

> This really just re-inforces the point that this doesn't belong
> in the public API / RPC layer at all. It needs to all work via
> a plugin loaded in-process. That plugin can call out to whatever
> service it wants to use, whether this uses DBus or REST or
> something else is upto the plugin author.
>

One counterargument to this was that it would require mgmt app
developers to write an .so, or at least a C-compatible plugin.  While I
don't see that as an issue (apart from it possibly hindering the
adoption) I can offer an idea which came before we even discussed this
particular topic.  It actually started from the other end, but let me
give you some idea how it might be more usable approach for mgmt apps
like kubevirt.

I don't see the need to write an .so to be a unreasonable burden.

Me neither, I suggested that if someone wants to mend the behaviour of libvirt
in a specific way (e.g. requests that the daemon does not call some function)
can already be solved (or at least tried and tested) by LD_PRELOAD-ing a
library.  The first time it was said in a bit of an anger, but thinking about it
the point is still valid.  Of course that's not fitting for any production usage
for which we'd want something at least resembling a plug-in system.

The plugin does not need to contain any significant amount of
logic. If desired it could be nothing more than a shim which talks
to a separate $LANG daemon over an RPC system. Even if wanting a
self-contained plugin, its possible to write shared libraries in
common system languages like C, Go, Rust, as they can expose shim
functions with C linkage semantics. If you really wanted it would
be possible to write a C shim that loads the python/perl/ruby/wasm/
etc runtime, and then write your plugins in dynamic languages too.

If we provide a way to initialise this "plugin" at runtime, then there
is no need to mangle with reloading the daemon.  If we provide a way to
do this over the already existing connection, then it might be even
easier to use.  However that would require a way to transfer a coded
algorithm and being able to run it (ideally with reduced access to the
system for security purposes).  A format that could be pretty usable for
this is WASM.  And before anyone rolls their eyes give me a minute to
mention few points about it:

 - there is no limit to the source code language used

 - it can be built particularly for the libvirt daemon dynamically (if
   someone wants that)

 - it can be ran safely by giving it access only to a limited set of
   predefined APIs

 - libvirt can run it itself, the runtime can be built in

I'm not saying this is the solution to the points in question, just an
idea I had few months ago which lead nowhere because I did not have
enough time to make a simple bytecode run in a C program.

Of course most of the above points can be solved without utilising WASM,
e.g. by allowing plugins to be handled by admin APIs, running them in a
sandbox, etc.

I feel like the type of functionality we need plugins to support is
not something that we need to, nor should, make dynamically configurable
at runtime. If we need to have matched logic at startup and shutdown of
a VM, or plug and unplug of a device to a running VM, it is creating
us an unecessary problem to allow this to be dynamically changed. It
should be more than sufficient to configure this at host deployment
time prior to starting libvirtd. For kubevirt usage this is certainly
trivial, since they deploy and start a new libvirt at VM startup.

If we had static plugins, someone could write a loadable .so library
that contained the WASM runtime, and exposed a RPC API for injecting
WASM code to a running libvirt, allowing the things you describe
above, without having to tie libvirt to that approach.

Once there is an interface defined (for example the plugin) then it can do
anything, of course.  The advantage was mainly with code transfer from the
client to the server, once that is out of the question (well it was never in)
the other advantages are fall apart too.

Static plug-ins are the safest (and probably easiest) choice.  The argument
about writing them, and mostly other arguments as well, are mostly constructed
until people from mgmt apps weigh in with their opinions.

With regards,
Daniel
--
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Attachment:
signature.asc

Description: PGP signature