On Fri, Jan 13, 2023 at 12:26 AM Mingwei Zhang <mizhang@xxxxxxxxxx> wrote: > > On Thu, Jan 12, 2023 at 1:33 PM Chang S. Bae <chang.seok.bae@xxxxxxxxx> wrote: > > > > On 1/12/2023 1:21 PM, Mingwei Zhang wrote: > > > > > > The only comment I would have is that it seems not following the least > > > privilege principle as host process (QEMU) may not have the motivation > > > to do any matrix multiplication. But this is a minor one. > > > > > > Since this enabling once per-process, I am wondering when after > > > invocation of arch_prctl(2), all of the host threads will have a larger > > > fp_state? If so, that might be a sizeable overhead since host userspace > > > may have lots of threads doing various of other things, i.e., they may > > > not be vCPU threads. > > > > No, the permission request does not immediately result in the kernel's > > XSAVE buffer expansion, but only when the state is about used. As > > XFD-armed, the state use will raise #NM. Then, it will reallocate the > > task's fpstate via this call chain: > > > > #NM --> handle_xfd_event() --> xfd_enable_feature() --> fpstate_realloc() > > > > Thanks, > > Chang > > Thanks for the info. But I think you are talking about host level AMX > enabling. This is known to me. I am asking about how AMX was enabled > by QEMU and used by vCPU threads in the guest. After digging a little > bit, I think I understand it now. > > So, it should be the following: (in fact, the guest fp_state is not > allocated lazily but at the very beginning at KVM_SET_CPUID2 time). > > kvm_set_cpuid() / kvm_set_cpuid2() -> > kvm_check_cpuid() -> > fpu_enable_guest_xfd_features() -> > __xfd_enable_feature() -> > fpstate_realloc() > > Note that KVM does intercept #NM for the guest, but only for the > handling of XFD_ERR. > > Prior to the kvm_set_cpuid() or kvm_set_cpuid2() call, the QEMU thread > should ask for permission via arch_prctl(REQ_XCOMP_GUEST_PERM) in > order to become a vCPU thread. Otherwise, the above call sequence will > fail. Fortunately, asking-for-guest-permission is only needed once per > process (per-VM). > > Because of the above, the non-vCPU threads do not need to create a > larger fp_state unless/until they invoke kvm_set_cpuid() or > kvm_set_cpuid2(). > > Now, I think that closes the loop for me. > > Thanks. > > -Mingwei I'd still like to clean up CPUID.(EAX=0DH,ECX=0):EAX.XTILECFG[17] by keeping it consistent with CPUID.(EAX=0DH,ECX=0):EAX.XTILEDATA[18] in the guest, but it's not clear to me what the best way to do that is. The crux of the issue is that xstate_get_guest_group_perm() returns partial support for AMX when userspace doesn't call prctl(ARCH_REQ_XCOMP_GUEST_PERM), I.e. the guest CPUID will report XTILECFG=1 and XTILEDATA=0 in that case. In that situation, XTILECFG should be cleared for it to be consistent. I can see two ways of potentially doing that: 1. We can ensure that perm->__state_perm never stores partial support. 2. We can sanitize the bits in xstate_get_guest_group_perm() before they are returned, to ensure KVM never sees partial support. I like the idea of #1, but if that has negative effects on the host or XFD I'm open to #2. Though, XFD has its own field, so I thought that wouldn't be an issue. Would it work to set __state_perm and/or default_features (what originally sets __state_perm) to a consistent view, so partial support is never returned from xstate_get_guest_group_perm()?