Linux guarantees the stability of its userspace API, but the API itself is only informally described, primarily with English prose. I want to add an explicit, authoritative machine-readable definition of the Linux userspace API. As background, in a conventional libc like glibc, read(2) calls the Linux system call read, passing arguments in an architecture-specific way according to the specific details of read. The details of these syscalls are at best documented in manpages, and often defined only by the implementation. Anyone else who wants to work with a syscall, in any way, needs to duplicate all those details. So the most basic definition of the API would just represent the information already present in SYSCALL_DEFINE macros: the C types of arguments and return values. More usefully, it would describe the formats of those arguments and return values: that the first argument to read is a file descriptor rather than an arbitrary integer, and what flags are valid in the flags argument of openat, and that open returns a file descriptor. A step beyond that would be describing, in some limited way, the effects of syscalls; for example, that read writes into the passed buffer the number of bytes that it returned. Even a basic machine-readable definition of the Linux userspace API would have numerous benefits: * Debugging tools which need to understand the format of syscalls and their arguments in great detail, such as strace, are currently primarily hand-written with great duplication of effort. Even a basic description of syscalls would allow much of this code to be generated instead. * It often takes a long time for newly-added syscalls to be usable in userspace. With an explicit definition of the Linux userspace API, it would be easy to automatically generate functions for new syscalls, which could be deployed quickly either as part of libc or in a separate syscall library. * Implementers of new languages currently almost always make syscalls by going through libc. Supporting interoperability with C in this way is a major burden, and the resulting interfaces are typically highly unidiomatic for the new language. With a explicit definition of the Linux API, it would be much easier for new languages to make syscalls directly (rather than through libc) by automatically generating syscall functions which are idiomatic to the new language; for example, functions which preserve memory-safety and type-safety in Rust. * Reimplementers of the Linux API, such as Linuxulator, WSL1, and gVisor, would be able to generate stubs for the interfaces they need to implement automatically, reducing duplicated code and making them conform better to the Linux API. * Changes to Linux behavior that require a change in the API definition would deserve greater scrutiny by maintainers, since such a change might break userspace. This certainly could never catch all possible API breaks, but it would be one more way to prevent regressions. * Any other tool which needs to understand the Linux API would benefit, such as more esoteric projects to batch syscalls, intercept and rewrite syscalls, forward syscalls to remote hosts, or any other syscall manipulations. To write this definition, a new Linux-specific format for the definition might need to be created. At a minimum, it will need to be able to describe bit-level data formats, complex pointer-based data structures, tagged unions, "overloaded" syscalls such as ioctl, and architecture-specific divergences. Most existing formats and languages for describing interfaces like this unfortunately lack these capabilities. Whatever the format of the definition, the most important feature is that it must be maintainable by existing Linux developers. One way to achieve that might be to integrate it into the C code in some way, building on top of SYSCALL_DEFINE. The API description can then be automatically extracted from the C code into a more-easily-reusable format, which can be used as input for other tools. One step in this direction is Documentation/ABI, which specifies the stability guarantees for different userspace APIs in a semi-formal way. But it doesn't specify the actual content of those APIs, and it doesn't cover individual syscalls at all. Another related project is system call tables like https://marcin.juszkiewicz.com.pl/download/tables/syscalls.html which don't contain any more information than already in SYSCALL_DEFINE. Hopefully this sounds like a reasonable thing to do. I'm looking for any comments or suggestions, or related projects I don't know about.