On Fri, Oct 29, 2021 at 10:50 AM Catalin Marinas <catalin.marinas@xxxxxxx> wrote: > > First of all, a uaccess in interrupt should not force such signal as it > had nothing to do with the interrupted context. I guess we can do an > in_task() check in the fault handler. Yeah. It ends up being similar to the thread flag in that you still end up having to protect against NMI and other users of asynchronous page faults. So the suggestion was more of a "mindset" difference and modified version of the task flag rather than anything fundamentally different. > Second, is there a chance that we enter the fault-in loop with a SIGSEGV > already pending? Maybe it's not a problem, we just bail out of the loop > early and deliver the signal, though unrelated to the actual uaccess in > the loop. If we ever run in user space with a pending per-thread SIGSEGV, that would already be a fairly bad bug. The intent of "force_sig()" is not only to make sure you can't block the signal, but also that it targets the particular thread that caused the problem: unlike other random "send signal to process", a SIGSEGV caused by a bad memory access is really local to that _thread_, not the signal thread group. So somebody else sending a SIGSEGV asynchronsly is actually very different - it goes to the thread group (although you can specify individual threads too - but once you do that you're already outside of POSIX). That said, the more I look at it, the more I think I was wrong. I think the "we have a SIGSEGV pending" could act as the per-thread flag, but the complexity of the signal handling is probably an argument against it. Not because a SIGSEGV could already be pending, but because so many other situations could be pending. In particular, the signal code won't send new signals to a thread if that thread group is already exiting. So another thread may have already started the exit and core dump sequence, and is in the process of killing the shared signal threads, and if one of those threads is now in the kernel and goes through the copy_from_user() dance, that whole "thread group is exiting" will mean that the signal code won't add a new SIGSEGV to the queue. So the signal could conceptually be used as the flag to stop looping, but it ends up being such a complicated flag that I think it's probably not worth it after all. Even if it semantically would be fairly nice to use pre-existing machinery. Could it be worked around? Sure. That kernel loop probably has to check for fatal_signal_pending() anyway, so it would all work even in the presense of the above kinds of issues. But just the fact that I went and looked at just how exciting the signal code is made me think "ok, conceptually nice, but we take a lot of locks and we do a lot of special things even in the 'simple' force_sig() case". > Third is the sigcontext.pc presented to the signal handler. Normally for > SIGSEGV it points to the address of a load/store instruction and a > handler could disable MTE and restart from that point. With a syscall we > don't want it to point to the syscall place as it shouldn't be restarted > in case it copied something. I think this is actually independent of the whole "how to return errors". We'll still need to return an error from the system call, even if we also have a signal pending. Linus