Re: [PATCH v2 1/6] KVM: x86: Clear all supported MPX xfeatures if they are not all set

Aaron Lewis <aaronlewis@xxxxxxxxxx> · Fri, 13 Jan 2023 14:43:55 +0000

On Fri, Jan 13, 2023 at 12:26 AM Mingwei Zhang <mizhang@xxxxxxxxxx> wrote:
>
> On Thu, Jan 12, 2023 at 1:33 PM Chang S. Bae <chang.seok.bae@xxxxxxxxx> wrote:
> >
> > On 1/12/2023 1:21 PM, Mingwei Zhang wrote:
> > >
> > > The only comment I would have is that it seems not following the least
> > > privilege principle as host process (QEMU) may not have the motivation
> > > to do any matrix multiplication. But this is a minor one.
> > >
> > > Since this enabling once per-process, I am wondering when after
> > > invocation of arch_prctl(2), all of the host threads will have a larger
> > > fp_state? If so, that might be a sizeable overhead since host userspace
> > > may have lots of threads doing various of other things, i.e., they may
> > > not be vCPU threads.
> >
> > No, the permission request does not immediately result in the kernel's
> > XSAVE buffer expansion, but only when the state is about used. As
> > XFD-armed, the state use will raise #NM. Then, it will reallocate the
> > task's fpstate via this call chain:
> >
> > #NM --> handle_xfd_event() --> xfd_enable_feature() --> fpstate_realloc()
> >
> > Thanks,
> > Chang
>
> Thanks for the info. But I think you are talking about host level AMX
> enabling. This is known to me. I am asking about how AMX was enabled
> by QEMU and used by vCPU threads in the guest. After digging a little
> bit, I think I understand it now.
>
> So, it should be the following: (in fact, the guest fp_state is not
> allocated lazily but at the very beginning at KVM_SET_CPUID2 time).
>
>   kvm_set_cpuid() / kvm_set_cpuid2() ->
>     kvm_check_cpuid() ->
>       fpu_enable_guest_xfd_features() ->
>         __xfd_enable_feature() ->
>           fpstate_realloc()
>
> Note that KVM does intercept #NM for the guest, but only for the
> handling of XFD_ERR.
>
> Prior to the kvm_set_cpuid() or kvm_set_cpuid2() call, the QEMU thread
> should ask for permission via arch_prctl(REQ_XCOMP_GUEST_PERM) in
> order to become a vCPU thread. Otherwise, the above call sequence will
> fail. Fortunately, asking-for-guest-permission is only needed once per
> process (per-VM).
>
> Because of the above, the non-vCPU threads do not need to create a
> larger fp_state unless/until they invoke kvm_set_cpuid() or
> kvm_set_cpuid2().
>
> Now, I think that closes the loop for me.
>
> Thanks.
>
> -Mingwei

I'd still like to clean up CPUID.(EAX=0DH,ECX=0):EAX.XTILECFG[17] by
keeping it consistent with CPUID.(EAX=0DH,ECX=0):EAX.XTILEDATA[18] in
the guest, but it's not clear to me what the best way to do that is.
The crux of the issue is that xstate_get_guest_group_perm() returns
partial support for AMX when userspace doesn't call
prctl(ARCH_REQ_XCOMP_GUEST_PERM), I.e. the guest CPUID will report
XTILECFG=1 and XTILEDATA=0 in that case.  In that situation, XTILECFG
should be cleared for it to be consistent.  I can see two ways of
potentially doing that:

1. We can ensure that perm->__state_perm never stores partial support.

2. We can sanitize the bits in xstate_get_guest_group_perm() before
they are returned, to ensure KVM never sees partial support.

I like the idea of #1, but if that has negative effects on the host or
XFD I'm open to #2.  Though, XFD has its own field, so I thought that
wouldn't be an issue.  Would it work to set __state_perm and/or
default_features (what originally sets __state_perm) to a consistent
view, so partial support is never returned from
xstate_get_guest_group_perm()?