Re: [RFC PATCH v2 3/3] arm64: signal: Ensure si_code is valid for all fault signals

James Morse <james.morse@xxxxxxx> · Tue, 13 Feb 2018 13:58:55 +0000

Hi Dave,

On 30/01/18 18:50, Dave Martin wrote:
> Currently, as reported by Eric, an invalid si_code value 0 is
> passed in many signals delivered to userspace in response to faults
> and other kernel errors.  Typically 0 is passed when the fault is
> insufficiently diagnosable or when there does not appear to be any
> sensible alternative value to choose.
> 
> This appears to violate POSIX, and is intuitively wrong for at
> least two reasons arising from the fact that 0 == SI_USER:
> 
>  1) si_code is a union selector, and SI_USER (and si_code <= 0 in
>     general) implies the existence of a different set of fields
>     (siginfo._kill) from that which exists for a fault signal
>     (siginfo._sigfault).  However, the code raising the signal
>     typically writes only the _sigfault fields, and the _kill
>     fields make no sense in this case.
> 
>     Thus when userspace sees si_code == 0 (SI_USER) it may
>     legitimately inspect fields in the inactive union member _kill
>     and obtain garbage as a result.
> 
>     There appears to be software in the wild relying on this,
>     albeit generally only for printing diagnostic messages.
> 
>  2) Software that wants to be robust against spurious signals may
>     discard signals where si_code == SI_USER (or <= 0), or may
>     filter such signals based on the si_uid and si_pid fields of
>     siginfo._sigkill.  In the case of fault signals, this means
>     that important (and usually fatal) error conditions may be
>     silently ignored.
> 
> In practice, many of the faults for which arm64 passes si_code == 0
> are undiagnosable conditions such as exceptions with syndrome
> values in ESR_ELx to which the architecture does not yet assign any
> meaning, or conditions indicative of a bug or error in the kernel
> or system and thus that are unrecoverable and should never occur in
> normal operation.
> 
> The approach taken in this patch is to translate all such
> undiagnosable or "impossible" synchronous fault conditions to
> SIGKILL, since these are at least probably localisable to a single
> process.  Some of these conditions should really result in a kernel
> panic, but due to the lack of diagnostic information it is
> difficult to be certain: this patch does not add any calls to
> panic(), but this could change later if justified.
> 
> Although si_code will not reach userspace in the case of SIGKILL,
> it is still desirable to pass a nonzero value so that the common
> siginfo handling code can detect incorrect use of si_code == 0
> without false positives.  In this case the si_code dependent
> siginfo fields will not be correctly initialised, but since they
> are not passed to userspace I deem this not to matter.
> 
> A few faults can reasonably occur in realistic userspace scenarios,
> and _should_ raise a regular, handleable (but perhaps not
> ignorable/blockable) signal: for these, this patch attempts to
> choose a suitable standard si_code value for the raised signal in
> each case instead of 0.

> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 9b7f89d..4baa922 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -607,70 +607,70 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
[..]
> +	{ do_sea,		SIGKILL, SI_KERNEL,	"level 0 (translation table walk)"	},
> +	{ do_sea,		SIGKILL, SI_KERNEL,	"level 1 (translation table walk)"	},
> +	{ do_sea,		SIGKILL, SI_KERNEL,	"level 2 (translation table walk)"	},
> +	{ do_sea,		SIGKILL, SI_KERNEL,	"level 3 (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  BUS_OBJERR,	"synchronous parity or ECC error" },	// Reserved when RAS is implemented

I agree the translation-table related external-aborts should end up with
SIGKILL: there is nothing user-space can do.

You use the fault_info table to vary the signal and si_code that should be used,
but do_mem_abort() only uses these if the fn returns an error. For do_sea(),
regardless of the values in this table SIGBUS will be generated as it always
returns 0.

> @@ -596,7 +596,7 @@ static int do_sea(unsigned long addr, unsigned int esr,
struct pt_regs *regs)
>
>  	info.si_signo = SIGBUS;
>  	info.si_errno = 0;
> -	info.si_code  = 0;
> +	info.si_code  = BUS_OBJERR;
>  	if (esr & ESR_ELx_FnV)
>  		info.si_addr = NULL;
>  	else

do_sea() has the right fault_info entry to hand, so I think these need to change
to inf->sig and inf->code. (I assume its not valid to set si_addr for SIGKILL...)

Thanks,

James