With this proposed API, we seem to be combining two requirements, and I wonder if we should be treating them independently. Requirement 1: "Fine grained control". We want the kernel to be able to prohibit a program from using AMX. The foundation for this is a system call that the kernel can say "No". It may deny access for whatever reason it wants, including inability to allocate a buffer, or some TBD administer-invoked hook in the system call, say membership or lack of membership of the process in an empowered cgroup. Requirement 2: Ability to synchronously fail upon buffer allocation. I agree that pthread_create() returning an error code is more friendly way to kill a program rather than a SIGSEGV when touching AMX state for the first time. But the reality is, that program is almost certainly going to exit either way. So the 1st question is if the system call requesting permission should be on a per-process basis, or a per-task basis. A. per-task. If we do it this way, then we will likely wind up mandating a GET at the start of every routine in every library that touches AMX, and potentially also a PUT. This is because the library has no idea what thread called it. The plus is that this will address the "used once and sits on a buffer for the rest of the process lifetime' scenario. The minus is that high performance users will be executing thousands of unnecessary system calls that have zero value. B. per-process. If we do it this way, then the run time linker can do a single system call on behalf of the entire process, and there is no need to sprinkle system calls throughout the library. Presumably the startup code would query CPUID, query XCR0, query this system call, and set a global variable to access by all threads going forward. The plus is that permission makes more sense on a process basis than on a task basis. Why would the kernel give one thread in a process permission, and not another thread -- and if that happened, would a process actually be able to figure out what to do? If we do per-process, I don't see that the PUT call would be useful, and I would skip it. Neither A or B has an advantage in the situation where a thread is created long after initialization and faces memory allocation failure. A synchronously fails in the new system call, and B synchronously fails in pthread_create. The 2nd question is if "successful permission" implies synchronous allocation, or perhaps it allows "please enable on-demand dynamic allocation" X. Synchronous Allocation results in allocation failures returning a synchronous error code, explaining why the program needs to exit. The downside is that it is likely that in both case A and B, every thread in the program will allocate a buffer, if they ever use it or not. Indeed, it is possible that the API we have invented to manage AMX buffer use will actually *increase* AMX buffer use... a Y. Enable on-demand allocation. Here the system call enables XFD to not kill the process, but on first use to allocate a buffer for a thread that is actually touching AMX. The benefit is if you have a program with many threads, only the ones that actually use AMX will allocate buffers. Of course the down side is that this program is exposed to a SIGSEGV if vmalloc fails in that run-time allocation, rather than a friendly pthread_create -1 return code killing the program. And, of course, we can have our cake and eat it too, by having a the syscall tell the kernel if it wants (X) or (Y). The question is if it is worth the complexity of having two options. thoughts? -Len