Re: [RFC] rationale for systematic elimination of OP_SYMADDR instructions

Luc Van Oostenryck <luc.vanoostenryck@xxxxxxxxx> · Fri, 11 Aug 2017 14:25:51 +0200

On Thu, Aug 10, 2017 at 09:17:44PM -0400, Christopher Li wrote:
> First of all, I want to clarify that I ask this question is
> to satisfy my curiously why it is done this way,
> either it is the best optimal way to do it.
> 
> It is not reason to reject your patch. I trust you have good
> reason to do it, I will accept the patch even if I don't fully
> understand it. I still want to understand it after applying
> it.

OK, I'll try to explain it the best I can.

> Do CSE do you mean just CSE OP_SYMADDR or CSE the complicated
> instruction expand from OP_SYMADDR.

Just the OP_SYMADDR, there is no expansion for them at this level.

> Let say you have code like this, the symbol is assign in different
> scope block.
> 
> Before CSE there is 3 OP_SYMADDR "a".  After CSE where is those 3
> OP_SYMADDR "a" is likely to be combine into one? Where would the
> CSE place the OP_SYMADDR if it is not one of the beginning or dominator
> etc?

Exactly like any other expression that is CSEed.

Yes, I elude your question, on purpose, because the question
about *where* those instructions need to be is unrelated to the
question "must these instructions be present or not".

> In that case, you might just place it before load or store. You don't seem
> to need CSE to do it. Sorry I feel that I am still missing some big picture.
> I am desperately trying to fill it.

Thing is that the current linearizer already *create* them, all of them,
just before the load or store that need them.
There are also a few other situations where they are needed: when doing
address calculation for symbols. For example:
	extern int a[];
	int *ptr;
	...
	ptr = &a[i];

But in all cases, the OP_SYMADDRs are correctly *created*.

The problme is that (in most cases) these OP_SYMADDR are *directly* 
optimized away or more exactly back-folded with their symbol.
For example:
	extern int a;
	int v;
	...
	v = a;

The linearized code is something like:
	symaddr	%r1, a
	load	%r2, %r1[0]
And this is *exactly* what is needed. Among others things
the load take as operand an *address* and not a symbol
which is something that belong more to the source language.

But after the very first pass of simplify_instruction() the
symaddr is 'simplified away' and we get:
	load	%r2, a[0]

It's just this simplification that should be removed.

> I also think that you make the OP_SYMADDR a lower level IR than
> what I original have in mind writing linearizer.

I didn't create them :)

> I expect there is a separate
> phase where you translate the sparse IR into machine level code, that
> is where you can insert the equivalent of OP_SYMADDR.

Of course.
But again, look at the title of the thread: the question has never been
about the creation/insertion of these OP_SYMADDRs but about their
non-elimination.

> OK. For the sparse checker question.

I won't answer now to this question because I think, if it is still
pertinant, it needs to be reformulated taking in account the fact
that *all* needed OP_SYMADDR are already created during linearization.

-- Luc
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html