On 5/21/21 9:19 AM, Florian Weimer wrote: >> On 5/21/21 7:44 AM, Florian Weimer wrote: >>> * Dave Hansen via Libc-alpha: >>>> Our system calls are *REALLY* fast. We can even do a vsyscall for this >>>> if we want to get the overhead down near zero. Userspace can also cache >>>> the "I did the prctl()" state in thread-local storage if it wants to >>>> avoid the syscall. >>> Why can't userspace look at XCR0 to make the decision? >> >> The thing we're trying to avoid is a #NM exception from XFD (the new >> first-use detection feature) that occurs on the first use of AMX. >> XCR0 will have XCR0[AMX]=1, even if XFD is "armed" and ready to >> generate the #NM. > > I see. So essentially the hardware wants to offer transparent > initialize-on-use, but Linux does not seem to want to implement it this > way. I don't quite see it that way. The hardware wants to offer the OS a guarantee that it will know *BEFORE* an application tried to establish specific register state. An OS could implement relatively transparent XSAVE backing resizing with it, like the earlier AMX patches did. Or, the OS could use it to implement a nice, immediate thwack if the app misbehaves and violates the ABI, like we're moving toward now. > Is there still a chance to bring the hardware and Linux into alignment? I think they're aligned just fine. XFD might be a bit overblown as a feature for how Linux will use it, but other OSes might get some mileage out of it.