Re: [PATCH v9 5/6] signal: define the field siginfo.si_xflags

Peter Collingbourne <pcc@xxxxxxxxxx> · Tue, 25 Aug 2020 13:08:35 -0700

On Tue, Aug 25, 2020 at 7:47 AM Dave Martin <Dave.Martin@xxxxxxx> wrote:
>
> On Mon, Aug 24, 2020 at 06:27:51PM -0700, Peter Collingbourne wrote:
> > On Mon, Aug 24, 2020 at 7:03 AM Dave Martin <Dave.Martin@xxxxxxx> wrote:
> > >
> > > On Wed, Aug 19, 2020 at 06:37:25PM -0700, Peter Collingbourne wrote:
> > > > On Wed, Aug 19, 2020 at 8:40 AM Dave Martin <Dave.Martin@xxxxxxx> wrote:
> > > > >
> > > > > On Mon, Aug 17, 2020 at 08:33:50PM -0700, Peter Collingbourne wrote:
> > > > > > This field will contain flags that may be used by signal handlers to
> > > > > > determine whether other fields in the _sigfault portion of siginfo are
> > > > > > valid. An example use case is the following patch, which introduces
> > > > > > the si_addr_ignored_bits{,_mask} fields.
> > > > > >
> > > > > > A new sigcontext flag, SA_XFLAGS, is introduced in order to allow
> > > > > > a signal handler to require the kernel to set the field (but note
> > > > > > that the field will be set anyway if the kernel supports the flag,
> > > > > > regardless of its value). In combination with the previous patches,
> > > > > > this allows a userspace program to determine whether the kernel will
> > > > > > set the field.
> > > > > >
> > > > > > Ideally this field could have just been named si_flags, but that
> > > > > > name was already taken by ia64, so a different name was chosen.
> > > > > >
> > > > > > Alternatively, we may consider making ia64's si_flags a generic field
> > > > > > and having it appear at the end of _sigfault (in the same place as
> > > > > > this patch has si_xflags) on non-ia64, keeping it in the same place
> > > > > > on ia64. ia64's si_flags is a 32-bit field with only one flag bit
> > > > > > allocated, so we would have 31 bits to use if we do this.
> > > > >
> > > > > For clarity, is the new si_xflags field supposed to be valid for all
> > > > > signal types, or just certain signals and si_codes?
> > > >
> > > > It is intended to be valid for all signal types that use the _sigfault
> > > > union member of siginfo. As listed in siginfo.h these are: SIGILL,
> > > > SIGFPE, SIGSEGV, SIGBUS, SIGTRAP, SIGEMT.
> > >
> > > SIGSYS is similar to SIGILL, is that included also?
> >
> > I think that SIGSYS is covered by a separate _sigsys union member.
> >
> > > > > What happens for things like a rt_sigqueueinfo() from userspace?
> > > >
> > > > Hmm. Let's enumerate each of these things, which I believe are all of
> > > > the call sites of the function copy_siginfo_from_user and related
> > > > functions (correct me if I'm wrong):
> > > >
> > > > - ptrace(PTRACE_SETSIGINFO)
> > > > - pidfd_send_signal
> > > > - rt_sigqueueinfo
> > > > - rt_tgsigqueueinfo
> > > >
> > > > We can handle the last three by observing that the kernel forbids
> > > > sending a signal with these syscalls if si_code >= 0, so we can say
> > > > that the value of si_xflags is only valid if si_code >= 0.
> > >
> > > Hmmm, that's what the code says (actually >= 0 or SI_TKILL), but it's
> > > illogical.  Those are user signals, so there's no obvious reason why
> > > userspace shouldn't be allowed to generate their siginfo.  It would
> > > probably be better for the kernel to police si_pid etc. in the SI_USER
> > > and SI_TKILL cases rather than flatly refusing, but I guess that's a
> > > discussion for another day.
> > >
> > > I guess the combination of SI_FROMKERNEL() and the signal number being a
> > > known fault signal if probably sufficient for now.
> >
> > In v10 I ended up adding a comment saying that si_xflags is only valid
> > if 0 <= si_code < SI_KERNEL (the SI_KERNEL part was due to my
> > discovery of kernel code that was calling force_sig(SIGSEGV) where
> > force_sig uses the _kill union member). Your comment about SI_USER
>
> Although it's been there a long time, is this a bug?
>
> sigaction(2) says that SI_KERNEL can be reported for any signal, but
> doesn't say how/why.  It also says that si_addr is [unconditionally]
> valid for [kernel-generated] SIGSEGV.  ([] are my insertions).
>
> While it may be reasonable to expect userspace code to filter out user
> signals before assuming that siginfo fields are value, requiring user
> code to check for specific si_codes is a bit nastier.
>
> I rather suspect that little or no code out there is explicity checking
> for SI_KERNEL before assuming that si_addr etc. are valid today.

Right, but maybe that can be attributed to poor documentation (in the
man page), so maybe the right thing to do here is to make the
documentation more explicit. The kernel code itself is fairly clear
that SI_KERNEL does not use the _sigfault layout:
https://github.com/torvalds/linux/blob/6a9dc5fd6170d0a41c8a14eb19e63d94bea5705a/kernel/signal.c#L3173

And note that force_sig does not make the si_addr field valid either,
it sets it to 0 (on most architectures, as a result of si_addr
overlapping si_pid/si_uid which get set to 0 by that function), which
is not necessarily the correct value. For example, on 64-bit x86,
executing this code:

  volatile auto ptr = (char *)0xfedcba9876543210;
  *ptr = 42;

(i.e. accessing outside of the TASK_SIZE limit) will result in a call
to force_sig(SIGSEGV) setting si_addr=0. But this is clearly not an
accurate fault address. I don't know how x86 reports the fault address
to the kernel in this case but maybe it simply isn't available for
addresses larger than TASK_SIZE, so the right thing for the kernel to
do would be to indicate that the address is unavailable (for example,
by setting si_code=SI_KERNEL, as it is already doing). Then through
documentation updates, userspace can know that si_code=SI_KERNEL means
that the address is unavailable.

> > made me realize that is not exactly true (since kill and
> > pidfd_send_signal can send a fault signal with si_code == SI_USER). I
> > was not aware of the SI_FROMKERNEL() macro. In v11 I will update the
> > comment to say that SI_FROMKERNEL(si) && si->si_code != SI_KERNEL must
> > be true in order for si_xflags to be valid.
>
> Given the above, maybe it would be better to say nothing explicit about
> SI_KERNEL, but make sure that the additional siginfo fields are sanely
> zeroed anyway.

I think that for si_addr this happens as a result of setting
si_pid/si_uid, and for the other fields this happens as a result of
zeroing the padding between fields. I know that we'd prefer not to
rely on zeroing padding, but perhaps the zero padding can more be seen
as a last resort for keeping things working in case userspace fails to
check for SI_KERNEL.

> For kernel-generated signals we can guarantee this, so I
> think that requiring userspace to check explicitly for SI_KERNEL is too
> unrealistic (i.e., 90% of the time, people will forget ... and 99% of
> the time they will get away with it).

It's unfortunate that the conditions for accessing these fields are so
complex, but again this seems like part of the hand that we've been
dealt with this API. Fortunately the requirement to check for
SI_KERNEL should only really apply in practice to code accessing our
new fields. We can make it retroactively apply to existing fields, but
since that wouldn't be a change to the kernel code, just the
documentation, existing code will continue to operate in the same way
as it did before.

> I'm not sure that SI_FROMKERNEL() is standard btw, but I observed that
> it is present in the Linux UAPI header.
>
> The "si_code < 0 means userspace" convention seems well established, but
> I haven't found anywhere that this is clearly specified.
>
> >
> > > It might be helpful to have a helper to identify fault signals, but we
> > > don't have this today, and it's unlikely that a new kind of fault signal
> > > will crop up any time soon.
> > >
> > > Handlers that handle specific signal types won't care, but debuggers and
> > > generic backtracer code would have to be hand-hacked to add new kinds of
> > > fault signal today -- not a huge priority though, and orthogonal to this
> > > series.
> > >
> > > > As for the first one, it's more tricky. Arguably something like a
> > > > debugger should be able to send arbitrary signals to a debuggee, and
> > > > there's no reason why it shouldn't be able to set si_xflags in
> > > > siginfo, but on the other hand who knows how existing debuggers end up
> > > > setting this field today. Maybe all that we can do is have the kernel
> > > > clear si_xflags if it detects that the signal uses _sigfault, and let
> > > > si_xflags aware debuggers opt out of this behavior, perhaps by
> > > > introducing a PTRACE_SETSIGINFO2 or something.
> > >
> > > Most likely a debugger usually amends an existing siginfo from a trapped
> > > signal than generating a new one from scratch.
> >
> > Right, but it could have copied the fields by hand from a
> > kernel-supplied siginfo before amending it.
>
> It could have, but we can't cover every eventuality.  With a
> PTRACE_SETSIGINFO2, we at least get a promise from userspace that it
> processed the fields correctly.  (If userspace is wrong, or lying,
> that's its own funeral.  The ptracer is as good as root from the point
> of view of the traced process anyway.)

Agreed.

Peter