Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail

Segher Boessenkool <segher@xxxxxxxxxxxxxxxxxxx> · Thu, 9 Dec 2021 17:36:35 -0600

On Thu, Dec 09, 2021 at 08:30:56AM -0500, Marc Feeley wrote:
> [Marc Feeley here, the author of the Gambit Scheme compiler that generates the code which would benefit from tail-calls]

Hi :-)

> What is needed (for Gambit) is a way for the programmer to express in the source code that the tail-call is known to be optimizable, i.e. it does not violate any of the tail-calling constraints, such as all local variables are dead (including those whose address was taken with “&var”) at the moment of the call.

How can you express that in the source code?  The compiler might not
agree, especially on architectures the code write did not consider.

And why do you need it?  Is it just to not have unlimited poterntial
stack use, like, from threaded code?

I think the best the compiler can do is say "sorry, no can do" when it
does not know how to elide a return after some specific call, on
whatever cpu+os combo you are targeting.

> The code generated by the Gambit compiler is sufficiently complex that it would be impossible (in the halting problem sense) to check that the tail-calling constraints are not violated. However, I know the constraints are not violated because the code was designed that way. And if I’m mistaken in this belief, then I am the one that will bare the blame of the ensuing problems.

On most architectures you can simply check if there are any "return"
instructions generated, no?  Or do you generate any actual calls (to
user code) as well?

> So having a way to check during or after the compilation that tail-calls were optimized is not sufficient. A Scheme compiler that sometimes fails to compile is not very useful.  What is needed is a guarantee that specific C calls are optimized.

Then you need to write code that works as you want on all configurations
you support, state what those configs are, and test that it compiles on
all such configs, before every release.

> FYI the code generated by Gambit is put in “host” functions that look like this:
> 
>   typedef struct ___processor_state_struct { … };
>   typedef struct ___processor_state_struct *___processor_state;
>   typedef void (*___host)(___processor_state ___ps);
> 
>   static void host1(___processor_state ___ps) {
>     …
>     ___host h = …;
>     h(___ps);
>   }
> 
> So the last statement in the host function is a call to the next host function.

Classical threaded code.  Good :-)

Segher