Hi Paul, We were discussing some ideas on facebook so I wanted to just post them here as well. This is in the context of the RCU section of RT MC https://www.youtube.com/watch?v=bpyFQJV5gCI Detecting high kfree_rcu() load ---------- You mentioned about this. As I understand it, we did the kfree_rcu() batching to let the system not do anything RCU related until a batch has filled up enough or a timeout has occurred. This makes the GP thread and the system do less work. The problem you are raising in our facebook thread is, that during heavy load the "batch" can be large and be dumped into call_rcu() eventually. Wouldn't this be better handled generically within call_rcu() itself, for the benefit of other non-kfree_rcu workloads? That is if a large number of callbacks is dumped, then try to end the GP more quickly. This likely doesn't need a signal from kfree_rcu() since call_rcu() knows that it is being hammered. Detecting recursive call_rcu() within call_rcu() --------- We could use a per-cpu variable to detect a scenario like this, though I am not sure if preemption during call_rcu() itself would cause false positives. All rcuogp and rcuop threads tied to a house keeping CPU --- In LPC you mentioned about the problem of OOM if all rcuo* threads including the GP one are not able to keep up with heavy load. On Facebook I had proposed something like this: What about making the affinity setting to be a "soft affinity", that is respect it always expect in the uncommon case. In the uncommon case of heavy load, let the threads run wherever to prevent OOM. Sure that might make the system a little more disruptive, but if we are approaching OOM we have bigger problems right? Peter mentioned about rcuogp0 should have slightly higher prio than rcuop0 --------- You mentioned this is something to look into but not sure if we looked into it yet. A "heavy" call_rcu() caller using synchronize_rcu() if too many callbacks are dumped --------- How about doing this kind of call_rcu() to synchronize_rcu() transition automatically if the context allows it? I.e. Detect the context and if sleeping is allowed, then wait for the grace period synchronously in call_rcu(). Not sure about deadlocks and the like from this kind of waiting and have to think more. is square root of N number of rcuogp0 threads - the right optimization? --------- The question raised was can we do with fewer threads, or even just one? You mentioned the square root might not be the right choice. How do we test how well the system is doing. Are you running rcutorture with a certain tree configuration and monitor memory footprint / performance? BTW, I have 2 interns working on RCU (Amol and Madupharna also on CC). They were selected among several others as a part of the LinuxFoundation mentorship program. They are familiar with RCU. I have asked them to look at some RCU-list work and RCU sparse work. However, I can also have them look into a few other things as time permits and depending on what interests them. Thanks, Merry Christmas! - Joel