Re: [PATCH RFC 00/10] qemu: Enable SCHED_CORE for domains and helper processes

Michal Prívozník <mprivozn@xxxxxxxxxx> · Tue, 24 May 2022 11:50:46 +0200

On 5/23/22 18:13, Daniel P. Berrangé wrote:
> On Mon, May 09, 2022 at 05:02:07PM +0200, Michal Privoznik wrote:
>> The Linux kernel offers a way to mitigate side channel attacks on Hyper
>> Threads (e.g. MDS and L1TF). Long story short, userspace can define
>> groups of processes (aka trusted groups) and only processes within one
>> group can run on sibling Hyper Threads. The group membership is
>> automatically preserved on fork() and exec().
>>
>> Now, there is one scenario which I don't cover in my series and I'd like
>> to hear proposal: if there are two guests with odd number of vCPUs they
>> can no longer run on sibling Hyper Threads because my patches create
>> separate group for each QEMU. This is a performance penalty. Ideally, we
>> would have a knob inside domain XML that would place two or more domains
>> into the same trusted group. But since there's pre-existing example (of
>> sharing a piece of information between two domains) I've failed to come
>> up with something usable.
> 
> Right now users have two choices
> 
>   - Run with SMT enabled. 100% of CPUs available. VMs are vulnerable
>   - Run with SMT disabled. 50% of CPUs available. VMs are safe
> 
> What the core scheduling gives is somewhere inbetween, depending on
> the vCPU count. If we assume all guests have even CPUs then
> 
>   - Run with SMT enabled + core scheduling. 100% of CPUs available.
>     100% of CPUs are used, VMs are safe
> 
> This is the ideal scenario, and probably the fairly common scenario
> too as IMHO even number CPU counts are likely to be typical.
> 
> If we assume the worst case, of entirely 1 vCPU guests then we have
> 
>   - Run with SMT enabled + core scheduling. 100% of CPUs available.
>     50% of CPUs are used, VMs are safe
> 
> This feels highly unlikely though, as all except tiny workloads
> want > 1 vCPU.
> 
> With entirely 3 vCPU guests then we have
> 
>   - Run with SMT enabled + core scheduling. 100% of CPUs available.
>     75% of CPUs are used, VMs are safe
> 
> With entirely 5 vCPU guests then we have
> 
>   - Run with SMT enabled + core scheduling. 100% of CPUs available.
>     83% of CPUs are used, VMs are safe
> 
> If we have a mix of even and odd numbered vCPU guests, with mostly
> even numbered, then I think utilization will  be high enough that
> almost no one will care about the last few %.
> 
> While we could try to come up with a way to express sharing of
> cores between VMs I don't think its worth it, in the absence of
> someone presenting compelling data why it'll be needed in a non
> niche use case. Bear in mind, that users can also resort to
> pinning VMs explicitly to get sharing.
> 
> In terms of defaults I'd very much like us to default to enabling
> core scheduling, so that we have a secure deployment out of the box.
> The only caveat is that this does have the potential to be interpreted
> as a regression for existing deployments in some cases. Perhaps we
> should make it a meson option for distros to decide whether to ship
> with it turned on out of the box or not ?

Alternatively, distros can just patch qemu_conf.c which enables the
option in cfg (virQEMUDriverConfigNew()).

Michal