On Monday, 28 June 2021 10:11:16 PDT Peter Zijlstra wrote: > > Consequence: CPU feature checking is done *very* early, often before > > main(). > For the linker based ones, yes. IIRC the ifunc() attribute is > particularly useful here. Exactly. ifunc was designed for this exact purpose. And hence the fact that CPUID initialisation will be done very, very early. Anyway, if the AMX state is a sticky "set once per process", it's likely going to get set early for every process that *may* use AMX. And this is assuming we do the library right and only set it if has AMX code at all, instead of all the time. On the other hand, if it's not set once and for all, we'll have to contend with the size changing. TBH, this is a lot more complicated to deal with. Take the hypothetical example of a preemptive user-space task scheduler that interrupts an AMX routine (let's say for the sake of the argument that it is an on-stack signal; I don't see why a scheduler would need to be alt-stack). It will record the state and then transition to another routine. And this routine may be resumed in another thread of the same process. Will the kernel understand that the new routine does not need the AMX state? Will it understand that the *other* routine, in the other thread will? If this is not done automatically by the kernel, then the task scheduler will need to know to ask the kernel what the reference count for the AMX state is and will need a syscall to set it (not just increment/decrement, though one could implement that with a loop). This applies differently in the case of cooperative scheduling. The SysV ABI will probably say that the AMX state is caller-save, so the function call from the AMX-using routine implies all its state has been saved somewhere. But what about the kernel-side AMX refcount? Is that part of the ABI? -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel DPG Cloud Engineering