* Thomas Gleixner: > On Tue, Aug 02 2022 at 15:59, Jason A. Donenfeld wrote: >> On Tue, Aug 02, 2022 at 03:46:27PM +0200, Thomas Gleixner wrote: >>> Right now the Linux VDSO functions are 1:1 replacements for system calls >>> and not adding a magic pile of functionality which is otherwise not >>> available. >>> >>> What you are proposing is to have an implementation which is not >>> available via a regular syscall. Which means you are creating a VDSO >>> only syscall which still has the same problem as any other syscall in >>> terms of API design and functionality which needs to be supported >>> forever. >> >> Wait, what? That's not correct. The WHOLE point is that vdso getrandom() >> will generate bytes in the same way as the ordinary syscall, without >> differences. Same function name, same algorithm. But just faster, >> because vDSO. I explicitly don't want to dip into introducing something >> different. That's the big selling point: that vDSO getrandom() and >> syscall getrandom() are the same thing. If you trust one, you can trust >> the other. If you expect properties of one, you get that from the other. >> If you know the API of one, you can use the other. > > Seriously no. All existing VDSO functions have exactly the same function > signature and semantics as their syscall counterparts. So they are drop > in equivalent. > > But: > > ssize_t getrandom(void *, void *, size_t, unsigned int); > > is very much different than > > ssize_t getrandom(void *, size_t, unsigned int); > > Different signature and different semantics. Just use ssize_t getrandom(size_t, unsigned int, void *); then and have the system call ignore the argument. There is recent precedent for adding additional arguments to system calls, see membarrier. If we want to be super-conservative, we could add a new flag and have the vDSO version always call into the kernel if the flag isn't set. *This* part is far less problematic compared to the approach to per-thread memory allocation. We now have: * Explicit donation of memory areas to the kernel (set_robust_list, rseq). * This getrandom_alloc vDSO call which does something unspecified and may return pointers which are or are not abstract. (How is CRIU expected to handle this?) * There's also userspace shadow stack coming. I think the kernel moved away from implicit allocation, to something mmap-based. It's not clear to me why that would be okay here, but not for shadow stacks. Does io_uring have to handle a similar problem, too? As long as the vDSO doesn't use private system calls, I don't expect any practical problems, but this optimization doesn't really look to me like something that intrinsically benefits from a completely new way of allocating userspace memory for use by the kernel. Thanks, Florian