On Tue, May 24, 2022 at 05:35:03PM +0200, Michal Prívozník wrote: > On 5/24/22 12:33, Daniel P. Berrangé wrote: > > On Tue, May 24, 2022 at 11:50:50AM +0200, Michal Prívozník wrote: > >> On 5/23/22 18:30, Daniel P. Berrangé wrote: > >>> On Mon, May 09, 2022 at 05:02:17PM +0200, Michal Privoznik wrote: > >>>> Since the level of trust that QEMU has is the same level of trust > >>>> that helper processes have there's no harm in placing all of them > >>>> into the same group. > >>> > >>> This assumption feels like it might be a bit of a stretch. I > >>> recall discussing this with Paolo to some extent a long time > >>> back, but let me recap my understanding. > >>> > >>> IIUC, the attack scenario is that a guest vCPU thread is scheduled > >>> on a SMT sibling with another thread that is NOT running guest OS > >>> code. "another thread" in this context refers to many things > >>> > >>> - Random host OS processes > >>> - QEMU vCPU threads from a different geust > >>> - QEMU emulator threads from any guest > >>> - QEMU helper process threads from any guest > >>> > >>> Consider for example, if the QEMU emulator thread contains a password > >>> used for logging into a remote RBD/Ceph server. That is a secret > >>> credential that the guest OS should not have permission to access. > >>> > >>> Consider alternatively that the QEMU emulator is making a TLS connection > >>> to some service, and there are keys negotiated for the TLS session. While > >>> some of the data transmitted over the session is known to the guest OS, > >>> we shouldn't assume it all is. > >>> > >>> Now in the case of QEMU emulator threads I think you can make a somewhat > >>> decent case that we don't have to worry about it. Most of the keys/passwds > >>> are used once at cold boot, so there's no attack window for vCPUs at that > >>> point. There is a small window of risk when hotplugging. If someone is > >>> really concerned about this though, they shouldn't have let QEMU have > >>> these credentials in the first place, as its already vulnerable to a > >>> guest escape. eg use kernel RBD instead of letting QEMU directly login > >>> to RBD. > >>> > >>> IOW, on balance of probabilities it is reasonable to let QEMU emulator > >>> threads be in the same core scheduling domain as vCPU threads. > >>> > >>> In the case of external QEMU helper processes though, I think it is > >>> a far less clearcut decision. There are a number of reasons why helper > >>> processes are used, but at least one significant motivating factor is > >>> security isolation between QEMU & the helper - they can only communicate > >>> and share information through certain controlled mechanisms. > >>> > >>> With this in mind I think it is risky to assume that it is safe to > >>> run QEMU and helper processes in the same core scheduling group. At > >>> the same time there are likely cases where it is also just fine to > >>> do so. > >>> > >>> If we separate helper processes from QEMU vCPUs this is not as wasteful > >>> as it sounds. Some the helper processes are running trusted code, there > >>> is no need for helper processes from different guests to be isolated. > >>> They can all just live in the default core scheduling domain. > >>> > >>> I feel like I'm talking myself into suggesting the core scheduling > >>> host knob in qemu.conf needs to be more than just a single boolean. > >>> Either have two knobs - one to turn it on/off and one to control > >>> whether helpers are split or combined - or have one knob and make > >>> it an enumeration. > >> > >> Seems reasonable. And the default should be QEMU's emulator + vCPU > >> threads in one sched group, and all helper processes in another, right? > > > > Not quite. I'm suggesting that helper processes can remain in the > > host's default core scheduling group, since the helpers are all > > executing trusted machine code. > > > >>> One possible complication comes if we consider a guest that is > >>> pinned, but not on the fine grained per-vCPU basis. > >>> > >>> eg if guest is set to allow floating over a sub-set of host CPUs > >>> we need to make sure that it is possible to actually execute the > >>> guest still. ie if entire guest is pinned to 1 host CPU but our > >>> config implies use of 2 distinct core scheduling domains, we have > >>> an unsolvable constraint. > >> > >> Do we? Since we're placing emulator + vCPUs into one group and helper > >> processes into another these would never run at the same time, but that > >> would be the case anyways - if emulator write()-s into a helper's socket > >> it would be blocked because the helper isn't running. This "bottleneck" > >> is result of pinning everything onto a single CPU and exists regardless > >> of scheduling groups. > >> > >> The only case where scheduling groups would make the bottleneck worse is > >> if emulator and vCPUs were in different groups, but we don't intent to > >> allow that. > > > > Do we actually pin the helper processes at all ? > > Yes, we do. Into the same CGroup as emulator thread: > qemuSetupCgroupForExtDevices(). > > > > > I was thinking of a scenario where we implicitly pin helper processes to > > the same CPUs as the emulator threads and/or QEMU process-global pinning > > mask. eg > > > > If we only had > > > > <vcpu placement='static' cpuset="2-3" current="1">2</vcpu> > > > > Traditionally the emulator threads, i/o threads, vCPU threads will > > all float across host CPUs 2 & 3. I was assuming we also placed > > helper processes in these same 2 host CPUs. Not sure if that's right > > or not. Assuming we do, then... > > > > Lets say CPUs 2 & 3 are SMT siblings. > > > > We have helper processes in the default core scheduling > > domain and QEMU in a dedicated core scheduling domain. We > > loose 100% of concurrency between the vCPUs and helper > > processes. > > So in this case users might want to have helpers and emulator in the > same group. Therefore, in qemu.conf we should allow something like: > > sched_core = "none" // off, no SCHED_CORE > "emulator" // default, place only emulator & vCPU threads > // into the group > "helpers" // place emulator & vCPU & helpers into the > // group > > I agree that "helpers" is terrible name, maybe "emulator+helpers"? Or > something completely different? Maybe: A scalar is nice, but we can just call it "full" or "all", as in the opposite of "none". With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|