Re: problem with signedness of PSEUDO_VALs

Luc Van Oostenryck <luc.vanoostenryck@xxxxxxxxx> · Sun, 22 Jul 2018 23:55:10 +0200

On Sun, Jul 22, 2018 at 09:55:03PM +0100, Ramsay Jones wrote:
> 
> 
> On 22/07/18 21:46, Ramsay Jones wrote:
> > 
> > 
> > On 22/07/18 21:31, Ramsay Jones wrote:
> >>
> >>
> >> On 22/07/18 21:26, Ramsay Jones wrote:
> >> [snip]
> >>>> +* shift instructions:
> >>>> +	the type of the result must be the same as the type
> >>>> +	of the left operand but the type of the right operand
> >>>> +	is independent.
> >>>
> >>> But for constant shifts, the shift direction can be flipped
> >>> and the shift amount made non-negative, right?

(I'm not sure to understands what you mean)
A right shift with a negative count doesn't become a left shift
with a positive count and vice versa.

> >> Hmm, except that is not allowed by the C standard? (need to
> >> check).
> > 
> > Yep, C11, 6.5.7-3 says:
> > 
> > 3 The integer promotions are performed on each of the operands. The
> >   type of the result is that of the promoted left operand. If the
> >   value of the right operand is negative or is greater than or equal
> >   to the width of the promoted left operand, the behavior is undefined.

Yes, it's the whole problematic of C undefined behaviours.
What I try to do for UB (in continuation, in think, of sparse initial goal)
is to have:
* sensible warnings
* sensible behaviour/simplifications

> > ... and a few experiments with gcc seems to indicate that negative
> > shifts (left or right) return zero.

That's not what I'm seeing.
For example, the following code:
	unsigned fn1(unsigned x) { return x >> -1; }
gives (with gcc7.3 for ARM64):
	fn1:
        	mov     w1, -1
        	lsr     w0, w0, w1
        	ret
or (with GCC 7.3 for x86-84):
	fn1:
	        movl    %edi, %eax
	        movl    $-1, %ecx
	        shrl    %cl, %eax
	        ret
which is very fine (the compiler doesn't try to 'optimize'
them and leave te CPU to determine the result at run-time.
Likewise, I want that sparse/test-linearize would return:
	fn1:
		lsr.32	%r1, %arg1, $-1
		ret.32	%r1
which what is returned is the code is written (with -m64) as:
	unsigned fn1l(unsigned x) { return x >> -1L; }

The whole problem is that, in the IR, all constants integer values
are stored in PSEUDO_VALS in a host type width enough to hold
all values but without the associated target type.
This is very fine for most usage because the associated type
can easily be fetched from the context (usually, insn->type).
But for shift counts, there is no associated type in the IR:
it can be any integer type not smaller than 'int' (before
linearization, there is always an associated type).
So, after linearization (with -m64), both functions
	unsigned fn(unsigned x) { return x >> -2; }
	unsigned fl(unsigned x) { return x >> 0x00000000fffffffeU; }
have a shift count stored as 0x00000000fffffffe, without
any possibility the know that the first one should be interpreted
as a signed 32bit number and the second one as a unsigned
64bit number.

In fact, the real problem only occurs not for such functions
with a constant shift count but for functions where the shift
count become constant during the simplification phase, like:
	unsigned fn(unsigned x) { return x >> ((x - x) - 2); }

-- Luc
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html