On 1/12/2023 1:21 PM, Mingwei Zhang wrote:
The only comment I would have is that it seems not following the least privilege principle as host process (QEMU) may not have the motivation to do any matrix multiplication. But this is a minor one. Since this enabling once per-process, I am wondering when after invocation of arch_prctl(2), all of the host threads will have a larger fp_state? If so, that might be a sizeable overhead since host userspace may have lots of threads doing various of other things, i.e., they may not be vCPU threads.
No, the permission request does not immediately result in the kernel's XSAVE buffer expansion, but only when the state is about used. As XFD-armed, the state use will raise #NM. Then, it will reallocate the task's fpstate via this call chain:
#NM --> handle_xfd_event() --> xfd_enable_feature() --> fpstate_realloc() Thanks, Chang