Hey Paul,
Indeed you cannot. And if you build with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y it will yell at you when you try. You -can- pass on-stack rcu_head structures to call_srcu(), though, if that helps. You of course must have some way of waiting for the callback to be invoked before exiting that function. This should be easy for me to package into an API, maybe using one of the existing reference-counting APIs. So, do you have a separate stack frame for each of the desired call_srcu() invocations? If not, do you know at build time how many rcu_head structures you need? If the answer to both of these is "no", then it is likely that there needs to be an rcu_head in each of the relevant data structures, as was noted earlier in this thread. Yeah, I should go read the code. But I would need to know where it is and it is still early in the morning over here! ;-) I probably should also have read the remainder of the thread before replying, as well. But what is the fun in that?
The use-case is to quiesce submissions to queues. This flow is where we want to teardown stuff, and we can potentially have 1000's of queues that we need to quiesce each one. each queue (hctx) has either rcu or srcu depending if it may sleep during submission. The goal is that the overall quiesce should be fast, so we want to wait for all of these queues elapsed period ~once, in parallel, instead of synchronizing each serially as done today. The guys here are resisting to add a rcu_synchronize to each and every hctx because it will take 32 bytes more or less from 1000's of hctxs. Dynamically allocating each one is possible but not very scalable. The question is if there is some way, we can do this with on-stack or a single on-heap rcu_head or equivalent that can achieve the same effect.