On 25/10/2024 13:33, Dave Martin wrote: > On Fri, Oct 25, 2024 at 10:24:41AM +0200, Kevin Brodsky wrote: >>>>>>> @@ -907,6 +964,7 @@ SYSCALL_DEFINE0(rt_sigreturn) >>>>>>> { >>>>>>> struct pt_regs *regs = current_pt_regs(); >>>>>>> struct rt_sigframe __user *frame; >>>>>>> + struct user_access_state ua_state; >>>>>>> >>>>>>> /* Always make any pending restarted system calls return -EINTR */ >>>>>>> current->restart_block.fn = do_no_restart_syscall; >>>>>>> @@ -923,12 +981,14 @@ SYSCALL_DEFINE0(rt_sigreturn) >>>>>>> if (!access_ok(frame, sizeof (*frame))) >>>>>>> goto badframe; >>>>>>> >>>>>>> - if (restore_sigframe(regs, frame)) >>>>>>> + if (restore_sigframe(regs, frame, &ua_state)) >>>>>>> goto badframe; >>>>>>> >>>>>>> if (restore_altstack(&frame->uc.uc_stack)) >>>>>>> goto badframe; >>>>>>> >>>>>>> + restore_user_access_state(&ua_state); >>>>>>> + >>>>>>> return regs->regs[0]; >>>>>>> >>>>>>> badframe: >>>>>> The saving part I'm fine with. For restoring, I was wondering whether we >>>>>> can get a more privileged POR_EL0 if reading the frame somehow failed. >>>>>> This is largely theoretical, there are other ways to attack like >>>>>> writing POR_EL0 directly than unmapping/remapping the signal stack. >>>>>> >>>>>> What I'd change here is always restore_user_access_state() to >>>>>> POR_EL0_INIT. Maybe just initialise ua_state above and add the function >>>>>> call after the badframe label. >>>>> I'm not sure I understand. When we enter this function, POR_EL0 is set >>>>> to whatever the signal handler set it to (POR_EL0_INIT by default). >>>>> There are then two cases: >>>>> 1) Everything succeeds, including reading the saved POR_EL0 from the >>>>> frame. We then call restore_user_access_state(), setting POR_EL0 to the >>>>> value we've read, and return to userspace. >>>>> 2) Any uaccess fails (for instance reading POR_EL0). In that case we >>>>> leave POR_EL0 unchanged and deliver SIGSEGV. >>>>> >>>>> In case 2 POR_EL0 is most likely already set to POR_EL0_INIT, or >>>>> whatever the signal handler set it to. It's not clear to me that forcing >>>>> it to POR_EL0_INIT helps much. Either way it's doubtful that the SIGSEGV >>>>> handler will be able to recover, since the new signal frame we will >>>>> create for it may be a mix of interrupted state and signal handler state >>>>> (depending on exactly where we fail). >>>> If the SIGSEGV delivery succeeds, returning would restore the POR_EL0 >>>> set up by the previous signal handler, potentially more privileged. Does >>>> it matter? Can it return all the way to the original context? >> What we store into the signal frame when delivering that SIGSEGV is a >> mixture of the original state (up to the point of failure) and the >> signal handler's state (what we couldn't restore). It's hard to reason >> about how that SIGSEGV handler could possibly handle this, but in any >> case it would have to massage its signal frame so that the next >> sigreturn does the right thing. Restoring only part of the frame records >> is bound to cause trouble and that's true for POR_EL0 as well - I doubt >> there's much value in special-casing it. > This feels like a simplification? > > We can leave a mix of restored and unrestored state when generating the > SIGSEGV signal frame, providing that those changes will make no > difference when the rt_sigreturn is replayed. I'm not sure I understand what this means. If the SIGSEGV handler were to sigreturn without touching its signal frame, things are likely to explode: it may be returning to the point where the original handler called sigreturn, for instance (if the first uaccess failed during that sigreturn call). > POR_EL0 will make a difference, though. > > The POR_EL0 image in the SIGSEGV signal frame needs be the same value > that caused the original rt_sigreturn to barf (if this is what caused > the barf). It should be up to the SIGSEGV handler to decide what (if > anything) to do about that. The kernel can't know what userspace > intended. Unless I'm missing something this is exactly what happens now: what we store in the SIGSEGV frame is the POR_EL0 value the original handler was using. > Note that for this to work, the SIGSEGV stack (whether main or > alternate) must be accessible with POR_EL0_INIT permissions, or the > SIGSEGV handler must start with a (gross) asm shim to establish a > usable POR_EL0. But that's not really our problem here. This is indeed orthogonal - the SIGSEGV handler will be run with POR_EL0_INIT, like any other handler. The value we store in the frame is unrelated. > (I'm not saying that the kernel necessarily fails to do this -- I > haven't checked -- but just trying to understand the problem here.) > > > The actual problem here is that if the SIGSEGV handler wants to bail > out with a siglongjmp(), there is no way to determine the correct value > of POR_EL0 to restore. Correct, but again this is true of any other record - for instance TPIDR2. > I wonder whether POR_EL0 should be saved in sigjmp_buf (depending on > whether sigjmp_buf is horribly inextensible and also full up, or merely > horribly inextensible). It very much feels that this is the case - if a handler relies on longjmp() or setcontext() to restore a known state, then POR_EL0 should be part of that state. > > Does anyone know whether PKRU in sigjmp_buf on x86? I can't say for sure but I don't see PKRU being handled in setjmp/longjmp in glibc at least. Kevin