Sven Schnelle <svens@xxxxxxxxxxxxx> writes: > Hi Andy, > > Andy Lutomirski <luto@xxxxxxxxxx> writes: > >> On Wed, Mar 24, 2021 at 10:39 AM Vasily Gorbik <gor@xxxxxxxxxxxxx> wrote: >>> >>> Hi Andy, >>> >>> On Sat, Mar 20, 2021 at 08:48:34PM -0700, Andy Lutomirski wrote: >>> > Hi all- >>> > >>> > I'm working on my kentry patchset, and I encountered: >>> > >>> > commit 56e62a73702836017564eaacd5212e4d0fa1c01d >>> > Author: Sven Schnelle <svens@xxxxxxxxxxxxx> >>> > Date: Sat Nov 21 11:14:56 2020 +0100 >>> > >>> > s390: convert to generic entry >>> > >>> > As part of this work, I was cleaning up the generic syscall helpers, >>> > and I encountered the goodies in do_syscall() and __do_syscall(). >>> > >>> > I'm trying to wrap my head around the current code, and I'm rather confused. >>> > >>> > 1. syscall_exit_to_user_mode_work() does *all* the exit work, not just >>> > the syscall exit work. So a do_syscall() that gets called twice will >>> > do the loopy part of the exit work (e.g. signal handling) twice. Is >>> > this intentional? If so, why? >>> > >>> > 2. I don't understand how this PIF_SYSCALL_RESTART thing is supposed >>> > to work. Looking at the code in Linus' tree, if a signal is pending >>> > and a syscall returns -ERESTARTSYS, the syscall will return back to >>> > do_syscall(). The work (as in (1)) gets run, calling do_signal(), >>> > which will notice -ERESTARTSYS and set PIF_SYSCALL_RESTART. >>> > Presumably it will also push the signal frame onto the stack and aim >>> > the return address at the svc instruction mentioned in the commit >>> > message from "s390: convert to generic entry". Then __do_syscall() >>> > will turn interrupts back on and loop right back into do_syscall(). >>> > That seems incorrect. >>> > >>> > Can you enlighten me? My WIP tree is here: >>> > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/kentry >>> > >>> >>> For all the details to that change we'd have to wait for Sven, who is back >>> next week. >>> >>> > Here are my changes to s390, and I don't think they're really correct: >>> > >>> > >>> > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/diff/arch/s390/kernel/syscall.c?h=x86/kentry&id=58a459922be0fb8e0f17aeaebcb0ac8d0575a62c >>> >>> Couple of things: syscall_exit_to_user_mode_prepare is static, >>> and there is another code path in arch/s390/kernel/traps.c using >>> enter_from_user_mode/exit_to_user_mode. >>> >>> Anyhow I gave your branch a spin and got few new failures on strace test >>> suite, in particular on restart_syscall test. I'll try to find time to >>> look into details. >> >> I refreshed the branch, but I confess I haven't compile tested it. :) >> >> I would guess that the new test case failures are a result of the >> buggy syscall restart logic. I think that all of the "restart" cases >> except execve() should just be removed. Without my patch, I suspect >> that signal delivery with -ERESTARTSYS would create the signal frame, >> do an accidental "restarted" syscall that was a no-op, and then >> deliver the signal. With my patch, it may simply repeat the original >> interrupted signal forever. > > PIF_SYSCALL_RESTART is set in arch_do_signal_or_restart(), but only if > there's no signal handler registered. In that case we don't need a > signal frame, so that should be fine. > > The problem why your branch is not working is that arch_do_signal_or_restart() > gets called from exit_to_user_mode_prepare(), and that is after the > check whether PIF_SYSCALL_RESTART is set in __do_syscall(). > > So i'm wondering how to fix that. x86 simply rewinds the pc, so the > system call instruction gets re-executed when returning to user > space. For s390 that doesn't work, as the s390 svc instruction might > have the syscall number encoded. If we would have to restart a system > call with restart_systemcall(), we need to change the syscall number to > __NR_restart_syscall. As we cannot change the hardcoded system call > number, we somehow need to handle that inside of the kernel. > > So i wonder whether we should implement the PIF_SYSCALL_RESTART check in > entry.S after all the return to user space entry code was run but before > doing the real swtch back to user space. If PIF_SYSCALL_RESTART is set > we would then just jump back to the entry code and pretend we came from > user space. > > That would have the benefit that the entry C code looks the same like > other architectures and that amount of code to add in entry.S shouldn't > be much. Thinking about this again i guess this idea won't work because the exit loop might have scheduled the old process away already...