On Thu, Nov 23, 2023 at 4:09 PM Hengqi Chen <hengqi.chen@xxxxxxxxx> wrote: > > On Thu, Nov 23, 2023 at 2:13 PM Huacai Chen <chenhuacai@xxxxxxxxxx> wrote: > > > > Hi, Hengqi, > > > > On Thu, Nov 23, 2023 at 1:49 PM Hengqi Chen <hengqi.chen@xxxxxxxxx> wrote: > > > > > > On Wed, Nov 22, 2023 at 3:58 PM Huacai Chen <chenhuacai@xxxxxxxxxx> wrote: > > > > > > > > Hi, Hengqi, > > > > > > > > On Wed, Nov 22, 2023 at 3:34 PM Hengqi Chen <hengqi.chen@xxxxxxxxx> wrote: > > > > > > > > > > Hi, Huacai, > > > > > > > > > > On Wed, Nov 22, 2023 at 2:32 PM Huacai Chen <chenhuacai@xxxxxxxxxx> wrote: > > > > > > > > > > > > Hi, Hengqi, > > > > > > > > > > > > On Wed, Nov 22, 2023 at 1:14 PM Hengqi Chen <hengqi.chen@xxxxxxxxx> wrote: > > > > > > > > > > > > > > Currently, we store syscall number in pt_regs::regs[11] and it may be > > > > > > > changed during syscall execution. Take `execve` as an example: > > > > > > > > > > > > > > sys_execve > > > > > > > -> do_execve > > > > > > > -> do_execveat_common > > > > > > > -> bprm_execve > > > > > > > -> exec_binprm > > > > > > > -> search_binary_handler > > > > > > > -> load_elf_binary > > > > > > > -> ELF_PLAT_INIT > > > > > > > > > > > > > > ELF_PLAT_INIT reset regs[11] to 0, later in syscall_exit_to_user_mode > > > > > > > we get a wrong syscall nr. > > > > > > > > > > > > > > Known affected syscalls includes execve/execveat/rt_sigreturn. Tools > > > > > > > like execsnoop do not work properly because the sys_exit_* tracepoints > > > > > > > does not trigger at all. > > > > > > > > > > > > > > Let's store syscall nr in thread_info instead. > > > > > > Can we just modify ELF_PLAT_INIT and not clear regs[11]? > > > > > > > > > > > > > > > > I am uncertain about the side effects of changing ELF_PLAT_INIT. > > > > > From a completeness perspective, changing ELF_PLAT_INIT is suboptimal, > > > > > rt_sigreturn is affected in another code path, and there may be other > > > > > syscalls that I am unaware of. > > > > Save syscall number in thread_info has more side effects, because > > > > ptrace allows us to change the number during syscall, then we should > > > > keep consistency between syscall and regs[11]. > > > > > > > > > > How about the change below: > > > > > > diff --git a/arch/loongarch/include/asm/syscall.h > > > b/arch/loongarch/include/asm/syscall.h > > > index e286dc58476e..954ba53bcc9a 100644 > > > --- a/arch/loongarch/include/asm/syscall.h > > > +++ b/arch/loongarch/include/asm/syscall.h > > > @@ -23,7 +23,9 @@ extern void *sys_call_table[]; > > > static inline long syscall_get_nr(struct task_struct *task, > > > struct pt_regs *regs) > > > { > > > - return regs->regs[11]; > > > + long nr = task_thread_info(task)->syscall; > > > + > > > + return nr ? : regs->regs[11]; > > > } > > > > > > static inline void syscall_rollback(struct task_struct *task, > > > diff --git a/arch/loongarch/kernel/syscall.c b/arch/loongarch/kernel/syscall.c > > > index b4c5acd7aa3b..553ab0d624cb 100644 > > > --- a/arch/loongarch/kernel/syscall.c > > > +++ b/arch/loongarch/kernel/syscall.c > > > @@ -53,6 +53,7 @@ void noinstr do_syscall(struct pt_regs *regs) > > > regs->regs[4] = -ENOSYS; > > > > > > nr = syscall_enter_from_user_mode(regs, nr); > > > + current_thread_info()->syscall = nr; > > > > > > if (nr < NR_syscalls) { > > > syscall_fn = sys_call_table[nr]; > > > @@ -61,4 +62,5 @@ void noinstr do_syscall(struct pt_regs *regs) > > > } > > > > > > syscall_exit_to_user_mode(regs); > > > + current_thread_info()->syscall = 0; > > > } > > > > > > * allow ptrace to change syscall nr > > > * sys_exit_* will also see the right syscall nr > > > * this works even if rt_sigreturn clobbers all pt_regs::regs > > No, I still prefer to modify ELF_PLAT_INIT, we can wait Arnd's comments. > > > > OK, I am not eager, anyway, we know the root cause. :) > > > And, do you mean modifying ELF_PLAT_INIT cannot solve the > > rt_sigreturn's problem? > > > > Right, see https://elixir.bootlin.com/linux/latest/source/arch/loongarch/kernel/signal.c#L807 Is this the expected behavior for rt_sigreturn()? Otherwise I think RISC-V has the same problem. And if we really need the 'correct' syscall number, we can overwrite regs[11] in sys_rt_sigreturn(). And another question: do you have any updates about the BPF system hang problem? :) Huacai > > > Huacai > > > > > > > > > And about ELF_PLAT_INIT, maybe Arnd can give us some more information. > > > > > > > > Hi, Arnd, > > > > > > > > I found some new architectures, such as ARM64 and RISC-V, just do > > > > nearly nothing in ELF_PLAT_INIT, while some old architectures, such as > > > > x86 and MIPS, clear most of the registers, do you know why? > > > > > > > > Huacai > > > > > > > > > > > > > > > Huacai > > > > > > > > > > > > > > > > > > > > Fixes: be769645a2aef ("LoongArch: Add system call support") > > > > > > > Signed-off-by: Hengqi Chen <hengqi.chen@xxxxxxxxx> > > > > > > > --- > > > > > > > arch/loongarch/include/asm/syscall.h | 2 +- > > > > > > > arch/loongarch/kernel/syscall.c | 1 + > > > > > > > 2 files changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/arch/loongarch/include/asm/syscall.h b/arch/loongarch/include/asm/syscall.h > > > > > > > index e286dc58476e..2317d674b92a 100644 > > > > > > > --- a/arch/loongarch/include/asm/syscall.h > > > > > > > +++ b/arch/loongarch/include/asm/syscall.h > > > > > > > @@ -23,7 +23,7 @@ extern void *sys_call_table[]; > > > > > > > static inline long syscall_get_nr(struct task_struct *task, > > > > > > > struct pt_regs *regs) > > > > > > > { > > > > > > > - return regs->regs[11]; > > > > > > > + return task_thread_info(task)->syscall; > > > > > > > } > > > > > > > > > > > > > > static inline void syscall_rollback(struct task_struct *task, > > > > > > > diff --git a/arch/loongarch/kernel/syscall.c b/arch/loongarch/kernel/syscall.c > > > > > > > index b4c5acd7aa3b..2783e33cf276 100644 > > > > > > > --- a/arch/loongarch/kernel/syscall.c > > > > > > > +++ b/arch/loongarch/kernel/syscall.c > > > > > > > @@ -52,6 +52,7 @@ void noinstr do_syscall(struct pt_regs *regs) > > > > > > > regs->orig_a0 = regs->regs[4]; > > > > > > > regs->regs[4] = -ENOSYS; > > > > > > > > > > > > > > + task_thread_info(current)->syscall = nr; > > > > > > > nr = syscall_enter_from_user_mode(regs, nr); > > > > > > > > > > > > > > if (nr < NR_syscalls) { > > > > > > > -- > > > > > > > 2.42.0 > > > > > > > > > >