On Mon, Jun 28, 2021 at 02:40:32PM +0200, Enrico Weigelt, metux IT consult wrote: > Going back to AMX - just had a quick look at the spec (*1). Sorry, but > this thing is really weird and horrible to use. Come on, these chips > already have billions of transistors, it really can't hurt so much > spending a few more to provide a clean and easy to use machine code > interface. Grmmpf! (This is a general problem we've got with so many > HW folks, why can't them just talk to us SW folks first so we can find > a good solution for both sides, before that goes into the field ?) > > And one point that immediately jumps into my mind (w/o looking deeper > into it): it introduces completely new registers - do we now need extra > code for tasks switching etc ? No, but because it's register state and part of XSAVE, it has immediate impact in ABI. In particular, the signal stack layout includes XSAVE (as does ptrace()). At the same time, 'legacy' applications (up until _very_ recently) had a minimum signal stack size of 2K, which is already violated by the addition of AVX512 (there's actual breakage due to that). Adding the insane AMX state (8k+) into that is a complete trainwreck waiting to happen. Not to mention that having !INIT AMX state has direct consequences for P-state selection and thus performance. For these reasons, us OS folks, will mandate you get to do a prctl() to request/release AMX (and we get to say: no). If you use AMX without this, the instruction will fault (because not set in XCR0) and we'll SIGBUS or something. Userspace will have to do something like: - check CPUID, if !AMX -> fail - issue prctl(), if error -> fail - issue XGETBV and check the AMX bit it set, if not -> fail - request the signal stack size / spawn threads - use AMX Spawning threads prior to enabling AMX will result in using the wrong signal stack size and result in malfunction, you get to keep the pieces.