On 2023-11-29 12:52, David Brown wrote:
On 29/11/2023 08:50, Matthias Pfaller wrote:
On 2023-11-28 19:00, David Brown wrote:
> Can I ask (either or both of you) why you are using are using nested functions like
> this? This is possibly the first time I have heard of anyone using them, certainly
> the first time in embedded development. Even when I programmed in Pascal, where
> nested functions are part of the language, I did not use them more than a couple of
> times.
>
> What benefit do you see in nested functions in C, compared to having separate
> functions? Have you considered moving to C++ and using lambdas, which are more
> flexible, standard, and can be very much more efficient?
>
> This is, of course, straying from the topicality of this mailing list. But I am
> curious, and I doubt if I am the only one who is.
- I'm maintaining our token threaded forth interpreter. In the inner loop there is
a absurdly big switch for the primitives. I'm loading rp, sp and tos into local
variables. pushing, popping and memory access is done by nested functions
(checking for stack over and under flows, managing tos, access violations, ...). Of
course that could be done by macros. But when I'm calling C-functions from within
the switch I'll sometimes pass pointers to the local functions (e.g. for
catch/throw exception handling).
- When calling list iterators, I'm sometimes passing references to nested functions
- When locking is necessary and the function has multiple return points I'm doing
something like:
void somefunction(void)
{
void f(void)
{
...
}
lock();
f();
unlock();
}
I know, in a lot of cases I could just define some outer static function or use
gotos. But to my eye it just looks nicer that way. In most cases there will be no
trampoline necessary anyway. Its not used that often and we could probably get rid
of it in most cases by using macros and ({ ... }).
Thanks for that.
I can appreciate that local functions can look nicer than macros or goto spaghetti.
In simple cases (which is probably the majority for your usage), the local functions
will be inlined and will give pretty much exactly the same code as you'd get for
macros, outer static functions, or other methods. But I'd be very unhappy to see
trampolines here, as you will need for more complicated cases. The overheads are not
something you'd want to see in the inner loop of an interpreter.
AFAIUI, the reason the compiler has to generate trampolines here is to make a
function that has access to some of the local variables, while being shoe-horned into
the appearance of a function with parameters that don't include any extra values or
references. If you were, as an alternative, to switch to C++ and use lambdas instead
of nested functions that all disappears precisely because lambdas do not have to be
forced to match the function signature - the generated lambda can take extra hidden
parameters (and even extra hidden state) as needed.
Of course it's never easy to change these kinds of things in existing code. And it
is particularly difficult to get solutions that work efficiently on a wide range of
compilers or versions.
David
We are using (at the moment) two micro controllers with cache. The at91sam4e is a
cortex-m4 device with two kilobytes of unified i/d-cache. Because of this cache must
only be considered when using DMA.
The atsame7x/atsamv7x series is a cortex-m7 device with 16k i-cache and 16k d-cache.
Here you have to worry about i-cache invalidates. Evicting a single i-cache (and the
trampoline code is small) doesn't hurt too much. Especially if it happens very seldom
(its not like every function passes pointers to nested functions...).
Besides that @300MHz (or @120MHz for the cortex-m4) the core is more than fast enough
for our applications. The 384k of RAM on the at91sam[ev]7x and the 128k of RAM on the
at91sam4e are a lot more of a hindrance...
I'm aware of the reason for the trampoline code and because of this I know that (as
you wrote) in the majority of the cases trampoline code is not needed (because no
outer arguments or variables are referenced and there is no passing of function
pointers to other functions).
In the cases where the trampoline code is needed I'm willing to take the performance
hit in exchange for the gain I get.
e.g. in the example with the interpreter inner loop I would need to pass along all
kinds of state every time I call an external function needing access to local
interpreter state. If I pass just a pointer to a callback function (that will then
access local state) there is much less opportunity for errors... In most of the cases
passing the callback is not necessary anyway.
Matthias