> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Sent: Saturday, December 11, 2021 9:29 PM > > Kevin, > > On Sat, Dec 11 2021 at 03:07, Kevin Tian wrote: > >> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > >> #NM in the guest is slow path, right? So why are you trying to optimize > >> for it? > > > > This is really good information. The current logic is obviously > > based on the assumption that #NM is frequently triggered. > > More context. > > When an application want's to use AMX, it invokes the prctl() which > grants permission. If permission is granted then still the kernel FPU > state buffers are default size and XFD is armed. > > When a thread of that process issues the first AMX (tile) instruction, > then #NM is raised. > > The #NM handler does: > > 1) Read MSR_XFD_ERR. If 0, goto regular #NM > > 2) Write MSR_XFD_ERR to 0 > > 3) Check whether the process has permission granted. If not, > raise SIGILL and return. > > 4) Allocate and install a larger FPU state buffer for the task. > If allocation fails, raise SIGSEGV and return. > > 5) Disarm XFD for that task > > That means one thread takes at max. one AMX/XFD related #NM during its > lifetime, which means two VMEXITs. > > If there are other XFD controlled facilities in the future, then it will > be NR_USED_XFD_CONTROLLED_FACILITIES * 2 VMEXITs per thread which > uses > them. Not the end of the world either. > > Looking at the targeted application space it's pretty unlikely that > tasks which utilize AMX are going to be so short lived that the overhead > of these VMEXITs really matters. > > This of course can be revisited when there is a sane use case, but > optimizing for it prematurely does not buy us anything else than > pointless complexity. I get all above. I guess the original open is also about the frequency of #NM not due to XFD. For Linux guest looks it's not a problem since CR0.TS is not set now when math emulation is not required: DEFINE_IDTENTRY(exc_device_not_available) { ... /* This should not happen. */ if (WARN(cr0 & X86_CR0_TS, "CR0.TS was set")) { /* Try to fix it up and carry on. */ write_cr0(cr0 & ~X86_CR0_TS); } else { /* * Something terrible happened, and we're better off trying * to kill the task than getting stuck in a never-ending * loop of #NM faults. */ die("unexpected #NM exception", regs, 0); } } It may affect guest which still uses CR0.TS to do lazy save. But likely modern OSes all move to eager save approach so always trapping #NM should be fine. Is this understanding correct? Thanks Kevin