Thomas Gleixner <tglx@xxxxxxxxxxxxx> writes: > On Mon, Oct 21 2024 at 09:46, Björn Töpel wrote: >> Celeste Liu <coelacanthushex@xxxxxxxxx> writes: >>> 1. syscall_enter_from_user_mode() will do two things: >>> 1) the return value is only to inform whether the syscall should be skipped. >>> 2) regs will be modified by filters (seccomp or ptrace and so on). >>> 2. for common entry user, there is two informations: syscall number and >>> the return value of syscall_enter_from_user_mode() (called is_skipped below). >>> so there is three situations: >>> 1) if syscall number is invalid, the syscall should not be performed, and >>> we set a0 to -ENOSYS to inform userspace the syscall doesn't exist. >>> 2) if syscall number is valid, is_skipped will be used: >>> a) if is_skipped is -1, which means there are some filters reject this syscall, >>> so the syscall should not performed. (Of course, we can use bool instead to >>> get better semantic) >>> b) if is_skipped != -1, which means the filters approved this syscall, >>> so we invoke syscall handler with modified regs. >>> >>> In your design, the logical condition is not obvious. Why syscall_enter_from_user_mode() >>> informed the syscall will be skipped but the syscall handler will be called >>> when syscall number is invalid? The users need to think two things to get result: >>> a) -1 means skip >>> b) -1 < 0 in signed integer, so the skip condition is always a invalid syscall number. >>> >>> In may way, the users only need to think one thing: The syscall_enter_from_user_mode() >>> said -1 means the syscall should not be performed, so use it as a condition of reject >>> directly. They just need to combine the informations that they get from API as the >>> condition of control flow. >> >> I'm all-in for simpler API usage! Maybe massage the >> syscall_enter_from_user_mode() (or a new one), so that additional >> syscall_get_nr() call is not needed? > > It's completely unclear to me what the actual problem is. The flow how > this works on all architectures is: > > regs->orig_a0 = regs->a0 > regs->a0 = -ENOSYS; > > nr = syscall_enter_from_user_mode(....); > > if (nr >= 0) > regs->a0 = nr < MAX_SYSCALL ? syscall(nr) : -ENOSYS; > > If syscall_trace_enter() returns -1 to skip the syscall, then regs->a0 > is unmodified, unless one of the magic operations modified it. > > If syscall_trace_enter() was not active (no tracer, no seccomp ...) then > regs->a0 already contains -ENOSYS. > > So what's the exact problem? It's a mix of calling convention, and UAPI: * RISC-V uses a0 for arg0 *and* return value (like arm64). * RISC-V does not expose orig_a0 to userland, and cannot easily start doing that w/o breaking UAPI. Now, when setting a0 to -ENOSYS, it's clobbering arg0, and the ptracer will have an incorrect arg0 (-ENOSYS).