Re: [PATCH RFC] rcu/tree: Use GFP_MEMALLOC for alloc memory to free memory pattern

Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> · Tue, 31 Mar 2020 14:30:00 -0400

On Tue, Mar 31, 2020 at 06:01:19PM +0200, Uladzislau Rezki wrote:
> > 
> > Yes, I mean __GFP_MEMALLOC. Sorry, the patch was just to show the idea and
> > marked as RFC.
> > 
> > Good point on the atomic aspect of this path, you are right we cannot sleep.
> > I believe the GFP_NOWAIT I mentioned in my last reply will take care of that?
> > 
> I think there should be GFP_ATOMIC used, because it has more chance to
> return memory then GFP_NOWAIT. I see that Michal has same view on it.

I don't think so because GFP_ATOMIC implies GFP_NOWAIT. I am Ok with keeping
the GFP_ATOMIC as it is btw. Paul mentioned he prefers this. I agree with
that as well.

> > > As for removing __GFP_NOWARN. Actually it is expectable that an
> > > allocation can fail, if so we follow last emergency case. You
> > > can see the trace but what would you do with that information?
> > 
> > Yes, the benefit of the trace/warning is that the user can switch to a
> > non-headless API and avoid the synchronize_rcu(), that would help them get
> > faster kfree_rcu() performance instead of having silent slowdowns.
> > 
> Agree. What about just adding WARN_ON_ONCE()? I am just thinking if it
> could be harmful or not.

You mean WARN_ON_ONCE() before the synchronize_rcu() right? We could do that.
Paul mentioned to me he prefers if this new warning can be turned off with a
boot parameter since some future user may prefer no warning. I also agree.

If we add this then we can keep your __GFP_NOWARN flag with no additional GFP
flag changes.

> > It also tells us whether the headless API is worth it in the long run, I
> > think it is worth it because we will likely never hit the synchronize_rcu()
> > failsafe. But if we hit it a lot, at least it wont happen silently.
> > 
> Agree.
> 
> > Paul was concerned about following scenario with hitting synchronize_rcu():
> > 1. Consider a system under memory pressure.
> > 2. Consider some other subsystem X depending on another system Y which uses
> >    kfree_rcu(). If Y doesn't complete the operation in time, X accumulates
> >    more memory.
> > 3. Since kfree_rcu() on Y hits synchronize_rcu() a lot, it slows it down.
> >    This causes X to further allocate memory, further causing a chain
> >    reaction.
> > Paul, please correct me if I'm wrong.
> > 
> I see your point and agree that in theory it can happen. So, we should
> make it more tight when it comes to rcu_head attachment logic.

Right. Per discussion with Paul, we discussed that it is better if we
pre-allocate N number of array blocks per-CPU and use it for the cache.
Default for N being 1 and tunable with a boot parameter. I agree with this.

In current code, we have 1 cache page per CPU, but this is allocated only on
the first kvfree_rcu() request. So we could change this behavior as well to
make it pre-allocated.

Does this all sound good to you?

thanks,

 - Joel