Re: [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE

Jeff Xu <jeffxu@xxxxxxxxxxxx> · Mon, 10 Feb 2025 14:46:37 -0800



Hi Dmitry

On Thu, Feb 6, 2025 at 10:06 AM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>
> On Tue, 4 Feb 2025 at 11:02, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> >
> > Re commit 70044df250d022572e26cd301bddf75eac1fe50e:
> > https://lore.kernel.org/all/20240802061318.2140081-4-aruna.ramakrishna@xxxxxxxxxx/
> >
> > > If the alternate signal stack is protected by a different pkey than the
> > > current execution stack, copying xsave data to the sigaltstack will fail
> > > if its pkey is not enabled in the PKRU register.
> > >
> > > We do not know which pkey was used by the application for the altstack,
> > > so enable all pkeys before xsave.
> > >
> > > But this updated PKRU value is also pushed onto the sigframe, which
> > > means the register value restored from sigcontext will be different from
> > > the user-defined one, which is unexpected. Fix that by overwriting the
> > > PKRU value on the sigframe with the original, user-defined PKRU.
> >
> > Hi,
> >
> > This unfortunatly seems to be broken for rseq user-space writes.
> > If the signal is caused by rseq struct being inaccessible due to PKEYs,
> > we try to write to rseq again at setup_rt_frame->rseq_signal_deliver,
> > which happens _before_ sig_prepare_pkru and won't succeed
> > (PKEY is still inaccessible, hard kills the process).
> > Any PKEY sandbox would want to restict untrusted access to rseq
> > as well (otherwise allows easy sandbox escapes).
> >
> > If we do sig_prepare_pkru before rseq_signal_deliver (and generally
> > before any copy_to_userpace), then user-space handler gets SIGSEGV
> > and could unregister rseq and retry.
> >
> > However, I am not sure if it's the best solution performance-
> > and complexity-wise (for user-space). A better solution may be to
> > change __rseq_handle_notify_resume to temporary switch to default
> > PKEY if user accesses fail.
> > Rseq is similar to signals in this respect. Since rseq updates
> > happen asynchronously with respect to user-space control flow,
> > if a program uses rseq and ever makes rseq inaccessible with PKEYs,
> > it's in trouble and will be randomly killed.
> > Since rseq updates are asynchronous as signals, they shouldn't
> > assume PKEY is set to default value that allows access
> > to rseq descriptor.
> >
> > Thoughts?
>
> Another question about switching to pkey 0 and not switching back on all errors.
> Can it create security problems by allowing sandboxed code to escape?
>
Sandbox escape would be bad , we wouldn't want the calling thread to
get PKRU = 0 in any error path.

> Namely, here:
>
> +        /* Update PKRU to enable access to the alternate signal stack. */
> +        pkru = sig_prepare_pkru();
>          /* save i387 and extended state */
> -        if (!copy_fpstate_to_sigframe(*fpstate, (void __user
> *)buf_fx, math_size, pkru))
> +        if (!copy_fpstate_to_sigframe(*fpstate, (void __user
> *)buf_fx, math_size, pkru)) {
> +                /*
> +                 * Restore PKRU to the original, user-defined value; disable
> +                 * extra pkeys enabled for the alternate signal stack, if any.
> +                 */
> +                write_pkru(pkru);
>                  return (void __user *)-1L;
> +        }
>
> we restore to the original pkru on this error, but there are other
> failure paths later, e.g.:
> https://elixir.bootlin.com/linux/v6.13.1/source/arch/x86/kernel/signal_64.c#L199
>
> on these errors paths we will eventually get here to force_sig(SIGSEGV):
> https://elixir.bootlin.com/linux/v6.13.1/source/kernel/signal.c#L1685
> which just sends SIGSEGV and is not fatal.
>
> So hypothetically, if there is a SIGUSR1 handler without SA_ONSTACK,
> which fails, but SIGSEGV handler has SA_ONSTACK and doesn't fail, this
> will result in resetting PKRU to 0 without restoring it back.
> Or sandboxed code somehow arranges for the first signal setup for other reasons.
>
Can you walk me through the setup and steps that led to this situation?

> This is, of course, a tricky attack vector, and the program must
> resume after SIGSEGV somehow (there are some such cases, e.g. mmaping
> something lazily and retrying), but with security you never know how
> creative an attacker can get and what you are missing that they are
> not missing. So it looks safer to restore to the original PKRU on all
> errors.

Thanks
-Jeff