Hi Jens, Hi Pavel, This is the v1 of RFC to implement the kernel style return value. Motivation: Currently liburing depends on libc. We want to make liburing can be built without libc. This idea firstly posted as an issue on the liburing GitHub repository here: https://github.com/axboe/liburing/issues/443 The subject of the issue is: "An option to use liburing without libc?". On Mon, Sep 27, 2021 at 4:18 PM Mahdi Rakhshandehroo <notifications@xxxxxxxxxx> wrote: > There are a couple of issues with liburing's libc dependency: > > 1) libc implementations of errno, malloc, pthread etc. tend to > pollute the binary with unwanted global/thread-local state. > This makes reentrancy impossible and relocations expensive. > 2) libc doesn't play nice with non-POSIX threading models, like > green threads with small stack sizes, or direct use of the > clone() system call. This makes interop with other > languages/runtimes difficult. > > One could use the raw syscall interface to io_uring to address these > concerns, but that would be somewhat painful, so it would be nice > for liburing to support this use case out of the box. Perhaps > something like a NOLIBC macro could be added which, if defined, > would patch out libc constructs and replace them with non-libc > wrappers where applicable. A few API changes might be necessary for > the non-libc case (e.g. io_uring_get_probe/io_uring_free_probe), but > it shouldn't break existing applications as long as it's opt-in. ---------------------------------------------------------------- ### 1) Introduction We want to make the changes incrementally, start from making it possible to remove the `errno` variable dependency. So this RFC aims to make it possible to remove `errno` variable depedency from the liburing sources by implementing the kernel style return value. What we mean by "kernel style return value" is that, we wrap the syscall API to make it return negative error code when error happens, like we usually do in the kernel space code. So the caller doesn't have to check the `errno` variable. If we can land this "kernel style return value" on liburing, we will start working on series to support build with no libc. These changes will not break user land and no functional changes will be visible to user (only affect liburing internal sources). ### 2) How to deal with __sys_io_uring_{register,setup,enter2,enter} Currently we expose these functions (**AAA**) to userland: **AAA**: 1) `__sys_io_uring_register` 2) `__sys_io_uring_setup` 3) `__sys_io_uring_enter2` 4) `__sys_io_uring_enter` These functions are used by several tests. As the userland needs to check the `errno` value to use them properly, this means those functions always depend on libc. So we cannot change their behavior. As such, only for the **no libc** environment case, we remove those functions (**AAA**). Then we introduce new functions (**BBB**) with the same name (with extra underscore as prefix, 4 underscores). These functions do not use `errno` variable on the caller (they use the kernel style return value) and always exist regardless the libc existence. **BBB**: 1) `____sys_io_uring_register` 2) `____sys_io_uring_setup` 3) `____sys_io_uring_enter2` 4) `____sys_io_uring_enter` Summary 1) **AAA** will only exist for the libc environment. 2) **BBB** always exists. 3) Do not use **AAA** for the liburing internal (it's just for the userland backward compatibility). 4) For the libc environment, **BBB** may use `syscall(2)` and `errno` variable, only to emulate the kernel style return value. 5) For the no libc environment, **BBB** will use Assembly interface to perform the syscall (arch dependent). 6) Tests should not be affected, this is because of (1) and (4), which keep the compatibility. ### 3) How to deal syscalls We have 3 patches in this series to wrap the syscalls, they are: - Add `liburing_mmap()` and `liburing_munmap()` - Add `liburing_madvise()` - Add `liburing_getrlimit()` and `liburing_setrlimit()` For `liburing_{munmap,madvise,getrlimit,setrlimit}`, they will return negative value of error code if error. They basically just return an int, so nothing to worry about. Special case is for pointer return value like `liburing_mmap()`. In this case we take the `include/linux/err.h` file from the Linux kernel source tree and use `IS_ERR()`, `PTR_ERR()`, `ERR_PTR()` to deal with it. It is implemented in patch: - Add kernel error header `src/kernel_err.h` ### 4) How can this help to support no libc environment? When this kernel style return value gets adapted on liburing, we will start working on raw syscall directly written in Assembly (arch dependent). Me (Ammar Faizi) will start kicking the tires from x86-64 arch. Hopefully we will get support for other architectures as well. The example of liburing syscall wrapper may look like this: ```c void *liburing_mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset) { #ifdef LIBURING_NOLIBC /* * This is when we build without libc. * * Assume __raw_mmap is the syscall written in ASM. * * The return value is directly taken from the syscall * return value. */ return __raw_mmap(addr, length, prot, flags, fd, offset); #else /* * This is when libc exists. */ void *ret; ret = mmap(addr, length, prot, flags, fd, offset); if (ret == MAP_FAILED) ret = ERR_PTR(-errno); return ret; #endif } ``` ---------------------------------------------------------------- The following changes since commit ce10538688b93dafd257ebfed7faf18844e0052d: test: Fix endianess issue on `bind()` and `connect()` (2021-09-27 07:45:03 -0600) based on: git://git.kernel.dk/liburing.git master are available as 6 patches in this series, all will be posted as a response to this one. If you want to take git tag, it is available in the Git repository at: git://github.com/ammarfaizi2/liburing.git tags/nolibc-support-rfc-v1 Please review! ---------------------------------------------------------------- Ammar Faizi (6): src/syscall: Implement the kernel style return value Add kernel error header `src/kernel_err.h` Add `liburing_mmap()` and `liburing_munmap()` Add `liburing_madvise()` Add `liburing_getrlimit()` and `liburing_setrlimit()` src/{queue,register,setup}: Remove `#include <errno.h>` src/kernel_err.h | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ src/queue.c | 28 +++++++++---------------- src/register.c | 189 +++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------------------------------------------------------------------------ src/setup.c | 60 ++++++++++++++++++++++++++++------------------------- src/syscall.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- src/syscall.h | 18 ++++++++++++++++ 6 files changed, 284 insertions(+), 178 deletions(-) create mode 100644 src/kernel_err.h -- Ammar Faizi