On Thu, Jul 16, 2020 at 12:50 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > From: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > On syscall entry certain work needs to be done: > > - Establish state (lockdep, context tracking, tracing) > - Conditional work (ptrace, seccomp, audit...) > > This code is needlessly duplicated and different in all > architectures. > > + > +/** > + * arch_syscall_enter_seccomp - Architecture specific seccomp invocation > + * @regs: Pointer to currents pt_regs > + * > + * Returns: The original or a modified syscall number > + * > + * Invoked from syscall_enter_from_user_mode(). Can be replaced by > + * architecture specific code. > + */ > +static inline long arch_syscall_enter_seccomp(struct pt_regs *regs); Ick. I'd rather see arch_populate_seccomp_data() and kill this hook. But we can clean this up later. > +/** > + * arch_syscall_enter_audit - Architecture specific audit invocation > + * @regs: Pointer to currents pt_regs > + * > + * Invoked from syscall_enter_from_user_mode(). Must be replaced by > + * architecture specific code if the architecture supports audit. > + */ > +static inline void arch_syscall_enter_audit(struct pt_regs *regs); > + Let's pass u32 arch here. > +/** > + * syscall_enter_from_user_mode - Check and handle work before invoking > + * a syscall > + * @regs: Pointer to currents pt_regs > + * @syscall: The syscall number > + * > + * Invoked from architecture specific syscall entry code with interrupts > + * disabled. The calling code has to be non-instrumentable. When the > + * function returns all state is correct and the subsequent functions can be > + * instrumented. > + * > + * Returns: The original or a modified syscall number > + * > + * If the returned syscall number is -1 then the syscall should be > + * skipped. In this case the caller may invoke syscall_set_error() or > + * syscall_set_return_value() first. If neither of those are called and -1 > + * is returned, then the syscall will fail with ENOSYS. > + * > + * The following functionality is handled here: > + * > + * 1) Establish state (lockdep, RCU (context tracking), tracing) > + * 2) TIF flag dependent invocations of arch_syscall_enter_tracehook(), > + * arch_syscall_enter_seccomp(), trace_sys_enter() > + * 3) Invocation of arch_syscall_enter_audit() > + */ > +long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall); This should IMO also take u32 arch. I'm also uneasy about having this do the idtentry/irqentry stuff as well as the syscall stuff. Is there a particular reason you did it this way instead of having callers do: idtentry_enter(); instrumentation_begin(); syscall_enter_from_user_mode(); FWIW, I think we could make this even better -- couldn't this get folded together with syscall *exit* and become: idtentry_enter(); instrumentation_begin(); generic_syscall(); instrumentation_end(); idtentry_exit(); and generic_syscall() would call arch_dispatch_syscall(regs, arch, syscall_nr); > + > +/** > + * irqentry_enter_from_user_mode - Establish state before invoking the irq handler > + * @regs: Pointer to currents pt_regs > + * > + * Invoked from architecture specific entry code with interrupts disabled. > + * Can only be called when the interrupt entry came from user mode. The > + * calling code must be non-instrumentable. When the function returns all > + * state is correct and the subsequent functions can be instrumented. > + * > + * The function establishes state (lockdep, RCU (context tracking), tracing) > + */ > +void irqentry_enter_from_user_mode(struct pt_regs *regs); Unless the rest of the series works differently from what I expect, I don't love this name. How about normal_entry_from_user_mode() or ordinary_entry_from_user_mode()? After all, this seems to cover IRQ, all the non-horrible exceptions, and (internally) syscalls. --Andy