Dynamically allocating each one is possible but not very scalable.
The question is if there is some way, we can do this with on-stack
or a single on-heap rcu_head or equivalent that can achieve the same
effect.
If the hctx structures are guaranteed to stay put, you could count
them and then do a single allocation of an array of rcu_head structures
(or some larger structure containing an rcu_head structure, if needed).
You could then sequence through this array, consuming one rcu_head per
hctx as you processed it. Once all the callbacks had been invoked,
it would be safe to free the array.
Sounds too simple, though. So what am I missing?
We don't want higher-order allocations...
OK, I will bite... Do multiple lower-order allocations (page size is
still lower-order, correct?) and link them together.
Sorry, couldn't resist...
Possible, but I didn't want us to resort to all this complexity and
thought we can find a better, simpler solution.