Re: problem with signedness of PSEUDO_VALs

Luc Van Oostenryck <luc.vanoostenryck@xxxxxxxxx> · Mon, 23 Jul 2018 01:39:14 +0200

On Sun, Jul 22, 2018 at 11:17:40PM +0100, Ramsay Jones wrote:
> 
> 
> On 22/07/18 22:55, Luc Van Oostenryck wrote:
> > On Sun, Jul 22, 2018 at 09:55:03PM +0100, Ramsay Jones wrote:
> [snip]
> 
> >>> Yep, C11, 6.5.7-3 says:
> >>>
> >>> 3 The integer promotions are performed on each of the operands. The
> >>>   type of the result is that of the promoted left operand. If the
> >>>   value of the right operand is negative or is greater than or equal
> >>>   to the width of the promoted left operand, the behavior is undefined.
> > 
> > Yes, it's the whole problematic of C undefined behaviours.
> > What I try to do for UB (in continuation, in think, of sparse initial goal)
> > is to have:
> > * sensible warnings
> > * sensible behaviour/simplifications
> 
> Indeed.
> 
> >>> ... and a few experiments with gcc seems to indicate that negative
> >>> shifts (left or right) return zero.
> > 
> > That's not what I'm seeing.
> 
> Heh, yes, after reading some of the earlier patches in the
> following series, I changed my test program (I had only used
> -1 as a negative shift) and confirmed the 'modulo' behaviour
> on an x86-64.

Well, it seems that what GCC do for negative or over-sized shift
count is to not touch them at all and (thus), when generating the
target code, not use the shift immediate instructions but to put
these shift count in a register and use the shift register forms.
This move the C undefined-behaviour to the CPU behaviour for such
shifts (which seems something very reasonable).

> [snip] 
> > The whole problem is that, in the IR, all constants integer values
> > are stored in PSEUDO_VALS in a host type width enough to hold
> > all values but without the associated target type.
> > This is very fine for most usage because the associated type
> > can easily be fetched from the context (usually, insn->type).
> > But for shift counts, there is no associated type in the IR:
> > it can be any integer type not smaller than 'int' (before
> > linearization, there is always an associated type).
> > So, after linearization (with -m64), both functions
> > 	unsigned fn(unsigned x) { return x >> -2; }
> > 	unsigned fl(unsigned x) { return x >> 0x00000000fffffffeU; }
> > have a shift count stored as 0x00000000fffffffe, without
> > any possibility the know that the first one should be interpreted
> > as a signed 32bit number and the second one as a unsigned
> > 64bit number.
> 
> Indeed, I didn't get that at first, but it is clear now.
> 
> > In fact, the real problem only occurs not for such functions
> > with a constant shift count but for functions where the shift
> > count become constant during the simplification phase, like:
> > 	unsigned fn(unsigned x) { return x >> ((x - x) - 2); }
> 
> Yep, good point - I get it now. ;-)

I had few time today to think about it but think there are only
two possible solutions:
1) add the needed typing at instruction level
2) at some point (probably during linearization itself) decide
   that for now shift counts will be interpreted as some fixed
   type (for example signed or unsigned int, because larger types
   are rarely used for shift counts).

-- Luc
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html