Re: RCU ideas discussed at LPC

Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> · Fri, 3 Jan 2020 20:56:17 -0500

On Wed, Dec 25, 2019 at 05:05:32PM -0800, Paul E. McKenney wrote:
> On Wed, Dec 25, 2019 at 05:41:04PM -0500, Joel Fernandes wrote:
> > Hi Paul,
> > We were discussing some ideas on facebook so I wanted to just post
> > them here as well. This is in the context of the RCU section of RT MC
> > https://www.youtube.com/watch?v=bpyFQJV5gCI
> > 
> > Detecting high kfree_rcu() load
> > ----------
> > You mentioned about this. As I understand it, we did the kfree_rcu()
> > batching to let the system not do anything RCU related until a batch
> > has filled up enough or a timeout has occurred. This makes the GP
> > thread and the system do less work.
> > The problem you are raising in our facebook thread is, that during
> > heavy load the "batch" can be large and be dumped into call_rcu()
> > eventually. Wouldn't this be better handled generically within
> > call_rcu() itself, for the benefit of other non-kfree_rcu workloads?
> > That is if a large number of callbacks is dumped, then try to end the
> > GP more quickly. This likely doesn't need a signal from kfree_rcu()
> > since call_rcu() knows that it is being hammered.
> 
> Except that call_rcu() currently has no idea how many parcels of memory
> a given request from kfree_rcu() represents.

True. At the moment, neither does kfree_rcu() since we store only the
pointer. We could consult the low level allocator if they have this
information. If you could let me know how to make RCU more aggressive in this
case (once we know there's a problem), I could work on something like this. I
did have OOM issues in earlier versions of the kfree_rcu() patch. I could
boot a system with less memory and OOM it too with the tests even now.

> > Detecting recursive call_rcu() within call_rcu()
> > ---------
> > We could use a per-cpu variable to detect a scenario like this, though
> > I am not sure if preemption during call_rcu() itself would cause false
> > positives.
> 
> A call_rcu() from within an RCU callback function is legal and is
> sometimes done.  Or are you thinking of a call_rcu() from an interrupt
> handler interrupting another call_rcu()?

Oh, did not know this. I thought this was the point heavily discussed in the
LPC talk but must have misunderstood when you said you hoped no one was
precisely doing this..

> > All rcuogp and rcuop threads tied to a house keeping CPU
> > ---
> > In LPC you mentioned about the problem of OOM if all rcuo* threads
> > including the GP one are not able to keep up with heavy load. On
> > Facebook I had proposed something like this: What about making the
> > affinity setting to be a "soft affinity", that is respect it always
> > expect in the uncommon case. In the uncommon case of heavy load, let
> > the threads run wherever to prevent OOM. Sure that might make the
> > system a little more disruptive, but if we are approaching OOM we have
> > bigger problems right?
> 
> The problem is that there are a rather large number of ways to force
> a given kthread to execute only on a given CPU, and reverse-engineering
> all that within call_rcu() isn't reasonable.  An alternative is to
> disable offloading, wait for the offloaded callbacks to drain, then
> start up the usual softirq approach (or per-CPU kthread, as the case
> may be).  This self-throttles because whatever is generating callbacks
> gets preempted by softirq invocation.

Ok, agreed. Did you already implement the "disable offloading" code?

> > ---------
> > How about doing this kind of call_rcu() to synchronize_rcu()
> > transition automatically if the context allows it? I.e. Detect the
> > context and if sleeping is allowed, then wait for the grace period
> > synchronously in call_rcu(). Not sure about deadlocks and the like
> > from this kind of waiting and have to think more.
> 
> This gets rather strange in a production PREEMPT=n build, so not a
> fan, actually.  And in real-time systems, I pretty much have to splat
> anyway if I slow down call_rcu() by that much.
> 
> So the preference is instead detecting such misconfiguration and issuing
> appropriate diagnostics.  And making RCU more able to keep up when not
> grossly misconfigured, hence the kfree_rcu() memory footprint being
> fed into core RCU.

Ok. Is it not Ok to simply assume that a large number of callbacks queued
along with observing high memory pressure, means RCU should be more
aggressive anyway since whatever memory can be freed by invoking callbacks
should be helpful anyway? Or were you thinking making RCU aggressive when
there's a lot of memory pressure is not worth it, without knowing that RCU is
the cause for it?

> > is square root of N number of rcuogp0 threads - the right optimization?
> 
> If there were enough CPUs, it would be necessary to have three levels
> of hierarchy and to go to the cube root, but that would be more CPUs
> than I have seen used.
> > ---------
> > The question raised was can we do with fewer threads, or even just
> > one? You mentioned the square root might not be the right choice. How
> > do we test how well the system is doing. Are you running rcutorture
> > with a certain tree configuration and monitor memory footprint /
> > performance?
> 
> The issue prompting the hierarcy was wakeup overhead on the grace-period
> kthread.  Going to a hierarchy reduced the load on that single thread
> (which could otherwise become a bottleneck on large systems, and also
> reduced the absolute number of wakeups by up to almost a factor of two.
> Deepening the hierarchy would further reduce the wakeup load on the
> grace-period kthread, but would increase the total number of wakeups.
> 
> So this is not a matter of tweaks and optimizations.  I would need to
> see some horrible problem with the current setup to even consider
> making a change.

Ok, I only raised this because in the LPC talk you mentioned that you are not
sure if this is the right optimization. But I understand the rationale for
choosing some hierarchy in light of the wakeup performance improvements (I
already knew that this is why you had a hierarchy).

> > BTW, I have 2 interns working on RCU (Amol and Madupharna also on CC).
> > They were selected among several others as a part of the
> > LinuxFoundation mentorship program. They are familiar with RCU. I have
> > asked them to look at some RCU-list work and RCU sparse work. However,
> > I can also have them look into a few other things as time permits and
> > depending on what interests them.
> 
> Dog paddling before cliff diving, please!  ;-)

Sure. They are working on relatively simpler things for their internship but
I just put these ideas out there with them on CC so they can pick something
else as well if they have time and interest ;-)

> > Thanks, Merry Christmas!
> 
> And to you and yours as well!

Hope you had a good holiday season!

thanks,

 - Joel