On Wed, Sep 09, 2020 at 11:53:42PM +1000, Michael Ellerman wrote: > Hi Thomas, > > Sorry if this was discussed already somewhere, but I didn't see anything ... > > Thomas Gleixner <tglx@xxxxxxxxxxxxx> writes: > > On Wed, Aug 19 2020 at 10:14, Kyle Huey wrote: > >> tl;dr: after 27d6b4d14f5c3ab21c4aef87dd04055a2d7adf14 ptracer > >> modifications to orig_ax in a syscall entry trace stop are not honored > >> and this breaks our code. > ... > > diff --git a/kernel/entry/common.c b/kernel/entry/common.c > > index 9852e0d62d95..fcae019158ca 100644 > > --- a/kernel/entry/common.c > > +++ b/kernel/entry/common.c > > @@ -65,7 +65,8 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall, > > Adding context: > > /* Do seccomp after ptrace, to catch any tracer changes. */ > if (ti_work & _TIF_SECCOMP) { > ret = __secure_computing(NULL); > if (ret == -1L) > return ret; > } > > if (unlikely(ti_work & _TIF_SYSCALL_TRACEPOINT)) > trace_sys_enter(regs, syscall); > > > syscall_enter_audit(regs, syscall); > > > > - return ret ? : syscall; > > + /* The above might have changed the syscall number */ > > + return ret ? : syscall_get_nr(current, regs); > > } > > > > noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall) > > I noticed if the syscall number is changed by seccomp/ptrace, the > original syscall number is still passed to trace_sys_enter() and audit. > > The old code used regs->orig_ax, so any change to the syscall number > would be seen by the tracepoint and audit. Ah! That's no good. > I can observe the difference between v5.8 and mainline, using the > raw_syscall trace event and running the seccomp_bpf selftest which turns > a getpid (39) into a getppid (110). > > With v5.8 we see getppid on entry and exit: > > seccomp_bpf-1307 [000] .... 22974.874393: sys_enter: NR 110 (7ffff22c46e0, 40a350, 4, fffffffffffff7ab, 7fa6ee0d4010, 0) > seccomp_bpf-1307 [000] .N.. 22974.874401: sys_exit: NR 110 = 1304 > > Whereas on mainline we see an enter for getpid and an exit for getppid: > > seccomp_bpf-1030 [000] .... 21.806766: sys_enter: NR 39 (7ffe2f6d1ad0, 40a350, 7ffe2f6d1ad0, 0, 0, 407299) > seccomp_bpf-1030 [000] .... 21.806767: sys_exit: NR 110 = 1027 > > > I don't know audit that well, but I think it saves the syscall number on > entry eg. in __audit_syscall_entry(). So it will record the wrong > syscall happening in this case I think. > > Seems like we should reload the syscall number before calling > trace_sys_enter() & audit ? Agreed. I wonder what the best way to build a regression test for this is... hmmm. -- Kees Cook