Re: [PATCH v2] MIPS: Reduce _NSIG from 128 to 127 to avoid BUG_ON

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 28, 2013 at 11:03:33PM +0100, James Hogan wrote:
> On 28 June 2013 20:28, Denys Vlasenko <vda.linux@xxxxxxxxxxxxxx> wrote:
> > On Monday 17 June 2013 12:36, James Hogan wrote:
> >> On 14/06/13 17:03, James Hogan wrote:
> >> > MIPS has 128 signals, the highest of which has the number 128 (they
> >> > start from 1). The following command causes get_signal_to_deliver() to
> >> > pass this signal number straight through to do_group_exit() as the exit
> >> > code:
> >> >
> >> >   strace sleep 10 & sleep 1 && kill -128 `pidof sleep`
> >> >
> >> > However do_group_exit() checks for the core dump bit (0x80) in the exit
> >> > code which matches in this particular case and the kernel panics:
> >> >
> >> >   BUG_ON(exit_code & 0x80); /* core dumps don't get here */
> >> >
> >> > Lets avoid this by changing the ABI by reducing the number of signals to
> >> > 127 (so that the maximum signal number is 127). Glibc incorrectly sets
> >> > [__]SIGRTMAX to 127 already. uClibc sets it to 128 so it's conceivable
> >> > that programs built against uClibc which intentionally uses RT signals
> >> > from the top (SIGRTMAX-n, n>=0) would need an updated uClibc (and a
> >> > rebuild if it's crazy enough to use __SIGRTMAX).
> >>
> >> Hmm, although this works around the BUG_ON, this doesn't actually seem
> >> to be sufficient to behave correctly.
> >>
> >> So it appears the exit status is constructed like this:
> >> bits  purpose
> >> 0x007f        signal number (0-127)
> >> 0x0080        core dump
> >> 0xff00        exit status
> >>
> >> but the macros in waitstatus.h and wait.h in libc
> >> (see also "man 2 wait"):
> >> WIFEXITED:   status & 0x7f == 0
> >> WIFSIGNALED: status & 0x7f in [1..126] (i.e. not 0 or 127)
> >> WIFSTOPPED:  status & 0xff == 127
> >>
> >> So termination due to SIG127 looks like it's been stopped instead of
> >> terminated via a signal, unless a core dump occurs in which case none of
> >> the above match.
> >>
> >> (And termination due to SIG128 hits BUG_ON, otherwise would appear to
> >> have exited normally with core dump).
> >>
> >>
> >> Reducing number of signals to 126 to avoid this will change the glibc
> >> ABI too, in which case we may as well reduce to 64 to match other
> >> arches, which is more likely to break something (I'm not really
> >> comfortable making that change).
> >>
> >> Reducing to 127 (this patch) still leaves incorrect exit status codes
> >> for SIG127 ...
> >>
> >> Any further thoughts/opinions?
> >
> > Strictly speaking, exit status of 0x007f isn't ambiguous.
> >
> > Currently userspace uses the following rules
> > (assuming that status is 16-bit (IOW, dropping PTRACE_EVENT bits)):
> >
> > WIFEXITED(status)    = (status & 0x7f) == 0
> > WIFSIGNALED(status)  = (status & 0x7f) != 0 && (status & 0x7f) < 0x7f
> > WIFSTOPPED(status)   = (status & 0xff) == 0x7f
> > WIFCONTINUED(status) = (status == 0xffff)
> >
> > WEXITSTATUS(status)  = status >> 8
> > WSTOPSIG(status)     = status >> 8
> > WCOREDUMP(status)    = status & 0x80
> > WTERMSIG(status)     = status & 0x7f
> >
> > When process dies from signal 127, status is 0x007f and it is not a valid
> > "stopped by signal" indicator, since WSTOPSIG == 0 is an impossibility.
> >
> > Status 0x007f get misinterpreted by the rules above, namely,
> > WIFSTOPPED is true, WIFSIGNALED is false.
> >
> > But an alternative definition exists which works correctly with
> > all previous status codes, treats 0x007f as "killed by signal 127"
> > and isn't more convoluted.
> > In fact, while WIFSTOPPED needs one additional check,
> > WIFSIGNALED gets simpler (loses one AND'ing operation):
> >
> > WIFSTOPPED(status)   = (status & 0xff) == 0x7f && (status >> 8) != 0
> > WIFSIGNALED(status)  = status != 0 && status <= 0xff
> >
> > All other rules need no change.
> >
> > I think it's feasible to ask {g,uc}libc to change their defines
> > (on MIPS as a minimum), and live with 127 signals.
> 
> Thanks for the explanation. This makes a lot of sense and if I
> understand correctly it already describes the current behaviour of the
> kernel up to SIG127 (I hadn't twigged WIFSTOPPED should imply
> WSTOPSIG!=0 for some reason). I like it.

One other note on this issue: SIG128 also aliases CLONE_VM, and it
would be very bad if a program requesting SIG128 as its exit signal
when calling clone instead ended up with the effects of CLONE_VM...

Also, I have some improved macros for WIFSTOPPED and WIFSIGNALED which
avoid multiple evaluation of their arguments:

#define WIFSTOPPED(s) ((short)((((s)&0xffff)*0x10001)>>8) > 0x7f00)
#define WIFSIGNALED(s) (((s)&0xffff)-1 < 0xffu)

These are what we are using in musl libc now.

Rich


[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux