* Len Brown: > A. per-task. If we do it this way, then we will likely wind up > mandating a GET at the start of every routine in every library that > touches AMX, and potentially also a PUT. This is because the library > has no idea what thread called it. The plus is that this will address > the "used once and sits on a buffer for the rest of the process > lifetime' scenario. The minus is that high performance users will be > executing thousands of unnecessary system calls that have zero value. We could revive the KTLS proposal (userspace donates memory for use by the kernel & vDSO), and the thread could reserve (on-stack) buffer space for kernel use for the duration of the AMX computation. There would be a pointer to that space in the KTLS area, set upon entry of the AMX region, and cleared upon exit. It's not extremely cheap (unbounded alloca has a stack probing loop nowadays). But no system call is required. Thanks, Florian