Re: [RFC] rationale for systematic elimination of OP_SYMADDR instructions

Christopher Li <sparse@xxxxxxxxxxx> · Wed, 26 Apr 2017 07:33:46 -0400

On Tue, Apr 25, 2017 at 10:49 PM, Luc Van Oostenryck
<luc.vanoostenryck@xxxxxxxxx> wrote:
> Roughly, once you begin to play with code generation, something like
> OP_SYMADDR is an operation that you will really do.
> Depending on the relocation, it can even be something relatively costly:
> I'm thinking static code on a 64bit machine where you can only generate
> 16bit constants, others cases may be not at all cheaper.
> So it's something that soon or later will need to be exposed and
> doing CSE on the address is good.

I see your point. Let me rephrase your claim. Sparse assume getting
the symbol address
is trivial. But on some architecture that is non trivial to generate
symbol address.
Typical case would be, the machine instruction size is smaller than
the address size.
It will take a few machine instruction to generate a symbol address.

I agree with this part.

Now the second part of the claim is that, it would be better to left
the OP_SYMADDR
untouched and let CSE to remove it. That part I disagree.

The reason is that, currently CSE operate on the same basic block. It only
eliminate instruction but it does not relocate instructions.

A very common case is that, the symbol address was referenced in different
basic blocks.

extern int a, d;

if (...)
     a = d;
else if (...)
     a = d + 2;

CSE would not be able to simply remove the OP_SYMADDR for "a",
because they are not in the same basic block. The best result should be,
for all the usage of that symbol in a function, find the closest
common parent basic
block and put the OP_SYMADDR there.

Because getting symbol address is used very often. Let CSE go through a hash
to re-discover that all symbol address can be combined is costly and
not necessary.
And CSE does not cover all the case commit 962279e8 covers.

> If we would have kept the OP_SYMADRR and doing CSE on it.
> But for now we have:
>         foo:
>         .L0:
>                 <entry-point>
>                 load.32     %r2 <- 0[a]
>                 add.32      %r3 <- %r2, $1
>                 store.32    %r3 -> 0[a]
>                 ret
>
> whose translation would be:
>                 movw    r3, #:lower16:a
>                 movt    r3, #:upper16:a
>                 ldr     r2, [r3]
>                 add     r2, r2, #1
> !               movw    r3, #:lower16:a
> !               movt    r3, #:upper16:a

That is because you assume every time "a" was referenced
you need to redo the generate of 32 bit address "a". That is
not necessary true. In sparse IR, "a" already translate into
pseudo, which has the user chain.  It is relative simple for
the back end to find the best place to insert OP_SYMADDR.
The end of the day, it is the back end knows the machine
register allocation and reorder of the instruction if necessary.

So the proper  solution shouldn't be let CSE to discover OP_SYMADDR
yet another time. It should be find the best place to insert OP_SYMADDR.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html