在 2021/2/25 上午5:40, Jim Wilson 写道:
On Wed, Feb 24, 2021 at 6:18 AM Jiaxun Yang <jiaxun.yang@xxxxxxxxxxx
<mailto:jiaxun.yang@xxxxxxxxxxx>> wrote:
I found it's very difficult for GCC to generate this kind of pcrel_lo
expression,
RTX label_ref can't be lower into such LOW_SUM expression.
Yes, it is difficult. You need to generate a label, and put the label
number in an unspec in the auipc pattern, and then create a label_ref to
put in the addi. The fact that we have an unspec and a label_ref means
a number of optimizations get disabled, like basic block duplication and
loop unrolling, because they can't make a copy of an instruction that
uses a label as data, as they have no way to know how to duplicate the
label itself. Or at least RISC-V needs to create one label. You
probably need to create two labels.
There is a far easier way to do this, which is to just emit an assembler
macro, and let the assembler generate the labels and relocs. This is
what the RISC-V GCC port does by default. This prevents some
optimizations like scheduling the two instructions, but enables some
other optimizations like loop unrolling. So it is a tossup. Sometimes
we get better code with the assembler macro, and sometimes we get better
code by emitting the auipc and addi separately.
The RISC-V gcc port can emit the auipc/addi with
-mexplicit-relocs -mcode-model=medany, but this is known to sometimes
fail. The problem is that if you have an 8-byte variable with 8-byte
alignment, and try to load it with 2 4-byte loads, gcc knows that
offset+4 must be safe from overflow because the data is 8-byte aligned.
However, when you use a pc-relative offset that is data address-code
address, the offset is only as aligned as the code is. RISC-V has
2-byte instruction alignment with the C extension. So if you have
offset+4 and offset is only 2-byte aligned, it is possible that offset+4
may overflow the add immediate field. The same thing can happen with
16-byte data that is 16-byte aligned, accessed with two 8-byte loads.
There is no easy software solution. We just emit a linker error in that
case as we can't do anything else. I think this would work better if
auipc cleared some low bits of the result, in which case the pc-relative
offset would have enough alignment to prevent overflow when adding small
offsets, but it is far too late to change how the RISC-V auipc works.
Hi all,
After spending days poking with AUIPC, I suddenly found we indeed have
ALUIPC
instruction in MIPS R6, which will clear low 16bit of AUIPC result.
So the whole thing now looks easier, we can have R_MIPS_PC_PAGE and
R_MIPS_PC_OFST and avoid all mess we met in RISC-V.
A pcrel loading could be as simple as:
aluipc a0, %pcrel_page(sym)
addiu a0, %pcrel_ofst(sym)
Thanks.
- Jiaxun