Re: arm-none-eabi, nested function trampolines and caching

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023-11-29 12:52, David Brown wrote:
On 29/11/2023 08:50, Matthias Pfaller wrote:
On 2023-11-28 19:00, David Brown wrote:
 > Can I ask (either or both of you) why you are using are using nested functions like
 > this?  This is possibly the first time I have heard of anyone using them, certainly
 > the first time in embedded development. Even when I programmed in Pascal, where
 > nested functions are part of the language, I did not use them more than a couple of
 > times.
 >
 > What benefit do you see in nested functions in C, compared to having separate
 > functions?  Have you considered moving to C++ and using lambdas, which are more
 > flexible, standard, and can be very much more efficient?
 >
 > This is, of course, straying from the topicality of this mailing list. But I am
 > curious, and I doubt if I am the only one who is.

- I'm maintaining our token threaded forth interpreter. In the inner loop there is a absurdly big switch for the primitives. I'm loading rp, sp and tos into local variables. pushing, popping  and memory access is done by nested functions (checking for stack over and under flows, managing tos, access violations, ...). Of course that could be done by macros. But when I'm calling C-functions from within the switch I'll sometimes pass pointers to the local functions (e.g. for catch/throw exception handling).

- When calling list iterators, I'm sometimes passing references to nested functions

- When locking is necessary and the function has multiple return points I'm doing something like:

void somefunction(void)
{
   void f(void)
   {
      ...
   }
   lock();
   f();
   unlock();
}

I know, in a lot of cases I could just define some outer static function or use gotos. But to my eye it just looks nicer that way. In most cases there will be no trampoline necessary anyway. Its not used that often and we could probably get rid of it in most cases by using macros and ({ ... }).


Thanks for that.

I can appreciate that local functions can look nicer than macros or goto spaghetti. In simple cases (which is probably the majority for your usage), the local functions will be inlined and will give pretty much exactly the same code as you'd get for macros, outer static functions, or other methods.  But I'd be very unhappy to see trampolines here, as you will need for more complicated cases.  The overheads are not something you'd want to see in the inner loop of an interpreter.

AFAIUI, the reason the compiler has to generate trampolines here is to make a function that has access to some of the local variables, while being shoe-horned into the appearance of a function with parameters that don't include any extra values or references.  If you were, as an alternative, to switch to C++ and use lambdas instead of nested functions that all disappears precisely because lambdas do not have to be forced to match the function signature - the generated lambda can take extra hidden parameters (and even extra hidden state) as needed.

Of course it's never easy to change these kinds of things in existing code.  And it is particularly difficult to get solutions that work efficiently on a wide range of compilers or versions.

David

We are using (at the moment) two micro controllers with cache. The at91sam4e is a cortex-m4 device with two kilobytes of unified i/d-cache. Because of this cache must only be considered when using DMA.

The atsame7x/atsamv7x series is a cortex-m7 device with 16k i-cache and 16k d-cache. Here you have to worry about i-cache invalidates. Evicting a single i-cache (and the trampoline code is small) doesn't hurt too much. Especially if it happens very seldom (its not like every function passes pointers to nested functions...).

Besides that @300MHz (or @120MHz for the cortex-m4) the core is more than fast enough for our applications. The 384k of RAM on the at91sam[ev]7x and the 128k of RAM on the at91sam4e are a lot more of a hindrance...

I'm aware of the reason for the trampoline code and because of this I know that (as you wrote) in the majority of the cases trampoline code is not needed (because no outer arguments or variables are referenced and there is no passing of function pointers to other functions).

In the cases where the trampoline code is needed I'm willing to take the performance hit in exchange for the gain I get.

e.g. in the example with the interpreter inner loop I would need to pass along all kinds of state every time I call an external function needing access to local interpreter state. If I pass just a pointer to a callback function (that will then access local state) there is much less opportunity for errors... In most of the cases passing the callback is not necessary anyway.

Matthias




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux