Hi! > Linux guarantees the stability of its userspace API, but the API > itself is only informally described, primarily with English prose. I > want to add an explicit, authoritative machine-readable definition of > the Linux userspace API. My background is in kernel testing I do maintain the Linux Test Project for more than a decade now. During the years we did create many "unit tests" for kernel syscalls that are watching over the syscall API and making sure that we get right results for both valid and invalid inputs. These tests can also be considered to be a form of a documentation. The same goes for some of the selftests that have been added to kernel repo in the recent years. In a sense these are the most detailed descriptions of the interfaces we have. The main problem is that the kernel userspace boundary is large, we have thousands of tests and I'm pretty sure that we don't cover even half of it. Also some of the interfaces are too complex to be even described in any formal system, mostly the modern stuff such as io_uring or bfp. I have had hard time even understading how to use these and I doubt I would be even able to build a formal system to describe them. Especially since the io_uring is mostly syscall less and we talk to the kernel by shared buffers and atomic data updates. > As background, in a conventional libc like glibc, read(2) calls the > Linux system call read, passing arguments in an architecture-specific > way according to the specific details of read. > > The details of these syscalls are at best documented in manpages, and > often defined only by the implementation. Anyone else who wants to > work with a syscall, in any way, needs to duplicate all those details. > > So the most basic definition of the API would just represent the > information already present in SYSCALL_DEFINE macros: the C types of > arguments and return values. More usefully, it would describe the > formats of those arguments and return values: that the first argument > to read is a file descriptor rather than an arbitrary integer, and > what flags are valid in the flags argument of openat, and that open > returns a file descriptor. A step beyond that would be describing, in > some limited way, the effects of syscalls; for example, that read > writes into the passed buffer the number of bytes that it returned. Having this would be awesome, this is just one step from actually generating automated tests for the syscalls. However my estimate is that even if you started to work on this now it will take decade to get somewhere, but maybe I'm too pesimistic. Stil fingers crossed. -- Cyril Hrubis chrubis@xxxxxxx