gcc not generating dwarf prologue_end and epilogue_begin .loc directives (was: Re: RFC: AVR interrupt handling issue)

Ian Molton <gcc_help-ian@xxxxxxxxxxxxxx> · Mon, 20 Dec 2021 21:06:24 +0000

Hi all, Henri,

As mentioned before, I'm trying to keep changes to the compiler to a
minimum, however I think that for AVR the information simply isn't
available after assembling. The question becomes "how do I get the
compiler to emit the information I need?"

I've been reading about dwarf2, which appears to be able to identify
prologue endings and epilogue beginnings with the .loc directive, but
the AVR GCC backend does not (afaict) generate prologue_end (or
epilogue_begin) .loc directives.

Reading through the gcc sources, I've found some backends call:

emit_note (NOTE_INSN_PROLOGUE_END);

but AVR does not.

Im unsure if this is what I need or not, though -

I tried adding this to the end of avr_end_prologue(), but this resulted
in the compiler appearing to go into an infinite loop. I think this is
caused by emit_note (NOTE_INSN_PROLOGUE_END) causing gcc to recursively
call avr_end_prologue(), judging by the (infinite) output.

I've also tried adding a avr_output_mi_thunk() function, similar to the
other architectures (arm, riscv) that call emit_note() in this manner,
but this resulted in nothing changing compared with not doing it.

placing the emit_note (NOTE_INSN_PROLOGUE_END) at the end of
avr_expand_prologue() results in avr_asm_function_end_prologue() being
called twice, and still without emitting a .loc directive containing
prologue_end.

If I can get the prologue end and epilogue beginning locations into the
dwarf2 output, I can write some code to patch the object files and
replace the pro/epilogues with the ones I want, directly in the object
files, before they get linked.

So the million dollar question is - why, even with -gdwarf do I not see
any .loc directives for the end of prologues or beginning of epilogues?

-Ian

On 18/12/2021 16:31, Henri Cloetens wrote:
> Dear Sir,
> 
> To customize the stack/unstacking, you need to modify the code of your
> backend port.
> I mean, the emitting of the stack/unstacking for the normal case (no
> interrupt, but function call),
> is in the backend part, NOT in the 'main' files, that are not intended
> to be modified.
> I would recommend you look there, and also in other backends, to find
> out how to do this.
> 
> Best Regards,
> 
> Henri.
> 
> 
> On 12/18/21 5:00 PM, Ian Molton wrote:
>> Hi all,
>>
>> I posted about this yesterday on the binutils list, but in the light of
>> day, I find myself re-thinking it;
>>
>> Right now, gcc will emit ISR prologues and epilogues such as this:
>>
>> __vector_35:
>>          push r1          ;
>>          push r0          ;
>>          in r0,__SREG__   ; ,
>>          push r0          ;
>>          clr __zero_reg__                 ;
>>          in r0,__RAMPZ__  ; ,
>>          push r0          ;
>>          push r18                 ;
>>     push r19
>>     push r20                 ;
>>          push r21                 ;
>>          push r22                 ;
>>          push r23                 ;
>>          push r24                 ;
>>          push r25                 ;
>>          push r26                 ;
>>          push r27                 ;
>>          push r30                 ;
>>          push r31                 ;
>>          push r28                 ;
>>          push r29                 ;
>>
>>          // External func call to provoke pro/epilogue
>>     // generation for this example...
>>     call foo_func();
>>     ...
>>
>>     pop r29          ;
>>          pop r28          ;
>>          pop r31          ;
>>          pop r30          ;
>>          pop r27          ;
>>          pop r26          ;
>>          pop r25          ;
>>          pop r24          ;
>>          pop r23          ;
>>          pop r22          ;
>>          pop r21          ;
>>          pop r20          ;
>>          pop r19          ;
>>          pop r18          ;
>>          pop r0           ;
>>          out __RAMPZ__,r0         ; ,
>>          pop r0           ;
>>          out __SREG__,r0  ; ,
>>          pop r0           ;
>>          pop r1           ;
>>          reti
>>
>>
>> The problem I have is that I want to switch to a separate irq stack when
>> I get an interrupt, which I cannot do in C, since all the prologue is
>> executed before the function body.
>>
>> I *can* do it if I use an assembler stub that then calls my ISR, which
>> switches stacks, and pushes the address of a custom epilogue onto the
>> stack, before executing the ISR, but this obviously wastes a lot of
>> cycles pushing the epilogue address, and by necessity, some of the
>> registers that the existing ISR prologue will redundantly push again.
>>
>> likewise, being able to insert my own epilogue sequence would allow me
>> to avoid an additional branch in the return path from the ISR.
>>
>> I can see two solutions to this problem:
>>
>> 1) Allow the compiler to omit certain registers from being saved in the
>> ISR prologue
>>
>> 2) Allow the user to specify custom pro/epilogue functions.
>> (-finstrument functions is similar, but not close enough)
>>
>>
>> 1) would work, but would require careful futzing about with the linker
>> to arrange the code in such a way that my prologue and epilogue are
>> located immediately around the ISR code. implementation could look like
>> -mno-save-isr-prologue="r0,r1,SREG,r26,r27" (or whatever regs the custom
>> prologue might save prior to the ISR prologue)
>>
>> 2) would be ideal. The compiler would know which registers the custom
>> prologue functions use, and would therefore be able to omit saving them
>> from the ISR prologue (and conversely from the epilogue).
>>
>>
>> Something like
>>
>> __attribute__ ((naked))
>> void my_isr_prologue (void)
>> {
>>     asm volatile("... whatever" : : : <clobbered regs>);
>> }
>>
>> __attribute__ ((naked))
>> void my_isr_epilogue (void)
>> {
>>     asm volatile("... un-whatever" : : : <clobbered regs>);
>>     asm volatile("reti");
>> }
>>
>> __attribute__ ((__isr_prologue__(my_isr_prologue, my_isr_epilogue)))
>> __attribute__ ((signal))
>> __vector_35(void)
>> {
>>
>> ... do interrupt-y things ...
>>
>> }
>>
>> Thoguhts? I can see that I could implement option 1) thus:
>>
>> I specify -mno-gas-isr-prologues to force gcc to emit full
>> pro/epi-logues in the assembler output.
>>
>> I can modify gcc/config/avr/avr.c at this point:
>>
>> (~line 1893)
>>
>>    avr_regs_to_save (&set);
>>
>>    if (no-save-isr-prologue)
>>    {
>>      // FIXME Remove registers my custom prologue saves from the set
>>      ...
>>    }
>>
>>    if (cfun->machine->is_interrupt || cfun->machine->is_signal)
>>      {
>>      ...
>>
>> which just leaves me with a (trivial) script to write that can stuff my
>> prologue / epilogue into the assembler, and re-assemble it into an
>> object.
>>
>> The downside is that very simple ISRs which don't need many registers
>> will be less efficient. But we're calling C here, and I would write such
>> simple ISRs in assembler anyway.
>>
>>
>> Option 2) would require more knowledge of gcc than I currently have.
>>
>>
>> Other (doomed? cursed?) options:
>>
>> 3)
>>
>> It almost seems like this could be solved if a function with
>> __attribute__ ((signal)) could (inline-) call another function with the
>> same attribute. The first function would not need to save any call-used
>> registers other than the ones it uses itself, and the called function
>> would be able to avoid saving any call-used registers that were saved by
>> its calling function.
>>
>> I suspect, however that that approach is probably doomed to failure, as
>> inlining the second function (to avoid the overhead of calling it) would
>> presumably also relocate its register save instructions right back to
>> the first functions prologue, where it isn't wanted, as described above.
>>
>> 4)
>>
>> Use a naked function for the ISR and a script to process the assembler
>> in order to generate the prologue and epilogue
>>
>> The GCC docs state that one should not write C code in a naked function.
>> Presumably, as long as you add enough prologue to provide a C
>> environment, this isn't a problem, but its explicitly disallowed, and
>> would require me writing a script to parse the assembler output to
>> generate the entire prologue/epilogue sequences. In a sense, this is the
>> "purest" option, but I suspect properly determining the registers used
>> in this way would be a challenge.
>>
>> Presumably this is why the __gcc_isr method of generating pro/epilogues
>> is a task split between GCC and binutils?
>>
>> I am at a bit of a loss as to what information binutils is supposed to
>> have in this case, that gcc does not - why *is* it done that way?
>>
>> I've poked at inline asm with gcc, and find that I can use clobbers to
>> force a save of RAMPZ (etc.) from inline assembler within an ISR, so I
>> don't really "get" what the __gcc_isr approach is buying... the ability
>> to write inline asm without being explicit about clobbers? What for?
>>
>> Thoughts?
> 
>