Len, On Sun, May 02 2021 at 11:27, Len Brown wrote: > Here is how it works: > > 1. The kernel boots and sees the feature in CPUID. > > 2. If the kernel supports that feature, it sets XCR0[feature]. > > For some features, there may be a bunch of kernel support, > while simple features may require only state save/restore. > > 2a. If the kernel doesn't support the feature, XCR0[feature] remains cleared. > > 3. user-space sees the feature in CPUID > > 4. user-space sees for the feature via xgetbv[XCR0] > > 5. If the feature is enabled in XCR0, the user happily uses it. > > For AMX, Linux implements "transparent first use" > so that it doesn't have to allocate 8KB context switch > buffers for tasks that don't actually use AMX. > It does this by arming XFD for all tasks, and taking a #NM > to allocate a context switch buffer only for those tasks > that actually execute AMX instructions. I thought more about this and it's absolutely the wrong way to go for several reasons. AMX (or whatever comes next) is nothing else than a device and it just should be treated as such. The fact that it is not exposed via a driver and a device node does not matter at all. Not doing so requires this awkward buffer allocation issue via #NM with all it's downsides; it's just wrong to force the kernel to manage resources of a user space task without being able to return a proper error code. It also prevents fine grained control over access to this functionality. As AMX is clearly a shared resource which is not per HT thread (maybe not even per core) and it has impact on power/frequency it is important to be able to restrict access on a per process/cgroup scope. Having a proper interface (syscall, prctl) which user space can use to ask for permission and allocation of the necessary buffer(s) is clearly avoiding the downsides and provides the necessary mechanisms for proper control and failure handling. It's not the end of the world if something which wants to utilize this has do issue a syscall during detection. It does not matter whether that's a library or just the application code itself. That's a one off operation and every involved entity can cache the result in TLS. AVX512 has already proven that XSTATE management is fragile and error prone, so we really have to stop this instead of creating yet another half baken solution. Thanks, tglx