Re: [PATCH RFC 10/10] qemu: Place helper processes into the same trusted group

Daniel P. Berrangé <berrange@xxxxxxxxxx> · Tue, 24 May 2022 17:29:09 +0100

On Tue, May 24, 2022 at 05:35:03PM +0200, Michal Prívozník wrote:
> On 5/24/22 12:33, Daniel P. Berrangé wrote:
> > On Tue, May 24, 2022 at 11:50:50AM +0200, Michal Prívozník wrote:
> >> On 5/23/22 18:30, Daniel P. Berrangé wrote:
> >>> On Mon, May 09, 2022 at 05:02:17PM +0200, Michal Privoznik wrote:
> >>>> Since the level of trust that QEMU has is the same level of trust
> >>>> that helper processes have there's no harm in placing all of them
> >>>> into the same group.
> >>>
> >>> This assumption feels like it might be a bit of a stretch. I
> >>> recall discussing this with Paolo to some extent a long time
> >>> back, but let me recap my understanding.
> >>>
> >>> IIUC, the attack scenario is that a guest vCPU thread is scheduled
> >>> on a SMT sibling with another thread that is NOT running guest OS
> >>> code. "another thread" in this context refers to many things
> >>>
> >>>   - Random host OS processes
> >>>   - QEMU vCPU threads from a different geust
> >>>   - QEMU emulator threads from any guest
> >>>   - QEMU helper process threads from any guest
> >>>
> >>> Consider for example, if the QEMU emulator thread contains a password
> >>> used for logging into a remote RBD/Ceph server. That is a secret
> >>> credential that the guest OS should not have permission to access.
> >>>
> >>> Consider alternatively that the QEMU emulator is making a TLS connection
> >>> to some service, and there are keys negotiated for the TLS session. While
> >>> some of the data transmitted over the session is known to the guest OS,
> >>> we shouldn't assume it all is.
> >>>
> >>> Now in the case of QEMU emulator threads I think you can make a somewhat
> >>> decent case that we don't have to worry about it. Most of the keys/passwds
> >>> are used once at cold boot, so there's no attack window for vCPUs at that
> >>> point. There is a small window of risk when hotplugging. If someone is
> >>> really concerned about this though, they shouldn't have let QEMU have
> >>> these credentials in the first place, as its already vulnerable to a
> >>> guest escape. eg use kernel RBD instead of letting QEMU directly login
> >>> to RBD.
> >>>
> >>> IOW, on balance of probabilities it is reasonable to let QEMU emulator
> >>> threads be in the same core scheduling domain as vCPU threads.
> >>>
> >>> In the case of external QEMU helper processes though, I think it is
> >>> a far less clearcut decision.  There are a number of reasons why helper
> >>> processes are used, but at least one significant motivating factor is
> >>> security isolation between QEMU & the helper - they can only communicate
> >>> and share information through certain controlled mechanisms.
> >>>
> >>> With this in mind I think it is risky to assume that it is  safe to
> >>> run QEMU and helper processes in the same core scheduling group. At
> >>> the same time there are likely cases where it is also just fine to
> >>> do so.
> >>>
> >>> If we separate helper processes from QEMU vCPUs this is not as wasteful
> >>> as it sounds. Some the helper processes are running trusted code, there
> >>> is no need for helper processes from different guests to be isolated.
> >>> They can all just live in the default core scheduling domain.
> >>>
> >>> I feel like I'm talking myself into suggesting the core scheduling
> >>> host knob in qemu.conf needs to be more than just a single boolean.
> >>> Either have two knobs - one to turn it on/off and one to control
> >>> whether helpers are split or combined - or have one knob and make
> >>> it an enumeration.
> >>
> >> Seems reasonable. And the default should be QEMU's emulator + vCPU
> >> threads in one sched group, and all helper processes in another, right?
> > 
> > Not quite. I'm suggesting that helper processes can remain in the
> > host's default core scheduling group, since the helpers are all
> > executing trusted machine code.
> > 
> >>> One possible complication comes if we consider a guest that is
> >>> pinned, but not on the fine grained per-vCPU basis.
> >>>
> >>> eg if guest is set to allow floating over a sub-set of host CPUs
> >>> we need to make sure that it is possible to actually execute the
> >>> guest still. ie if entire guest is pinned to 1 host CPU but our
> >>> config implies use of 2 distinct core scheduling domains, we have
> >>> an unsolvable constraint.
> >>
> >> Do we? Since we're placing emulator + vCPUs into one group and helper
> >> processes into another these would never run at the same time, but that
> >> would be the case anyways - if emulator write()-s into a helper's socket
> >> it would be blocked because the helper isn't running. This "bottleneck"
> >> is result of pinning everything onto a single CPU and exists regardless
> >> of scheduling groups.
> >>
> >> The only case where scheduling groups would make the bottleneck worse is
> >> if emulator and vCPUs were in different groups, but we don't intent to
> >> allow that.
> > 
> > Do we actually pin the helper processes at all ?
> 
> Yes, we do. Into the same CGroup as emulator thread:
> qemuSetupCgroupForExtDevices().
> 
> > 
> > I was thinking of a scenario where we implicitly pin helper processes to
> > the same CPUs as the emulator threads and/or QEMU process-global pinning
> > mask. eg
> > 
> > If we only had
> > 
> >   <vcpu placement='static' cpuset="2-3" current="1">2</vcpu>
> > 
> > Traditionally the emulator threads, i/o threads, vCPU  threads will
> > all float across host CPUs 2 & 3. I was assuming we also placed
> > helper processes in these same 2 host CPUs. Not sure if that's right
> > or not. Assuming we do, then...
> > 
> > Lets say CPUs 2 & 3 are SMT siblings.
> > 
> > We have helper processes in the default core scheduling
> > domain and QEMU in a dedicated core scheduling domain. We
> > loose 100% of concurrency between the vCPUs and helper
> > processes.
> 
> So in this case users might want to have helpers and emulator in the
> same group. Therefore, in qemu.conf we should allow something like:
> 
>   sched_core = "none" // off, no SCHED_CORE
>                "emulator" // default, place only emulator & vCPU threads
>                           // into the group
>                "helpers" // place emulator & vCPU & helpers into the
>                          // group
> 
> I agree that "helpers" is terrible name, maybe "emulator+helpers"? Or
> something completely different? Maybe:

A scalar is nice, but we can just call it "full" or "all", as in
the opposite of "none".

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|