Re: [PATCH 1/3] rcu: Use static initializer for krc.lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 21, 2020 at 07:05:56PM +0200, Sebastian Andrzej Siewior wrote:
> On 2020-04-21 08:41:01 [-0700], Paul E. McKenney wrote:
> > > Invoking queue_work() is not a problem locking wise. It does however
> > > extend the length of the critical section (which is not yet needed).
> > 
> > I am guessing that by "which is not yet needed" you were meaning that
> > it is not absolutely necessary to extend the length of the critical
> > section.  Please correct me if my guess is wrong.
> 
> By changing the lock type you extend the atomic section from "only"
> queue_work() (if invoked) to everything kfree_rcu() does plus
> queue_work().

Got it, thank you!

> > In the meantime, plunging ahead...
> > 
> > One approach might be to move queue_work() and friends out of the critical
> > section, but only enable interrupts between the two critical sections
> > under CONFIG_PREEMPT_RT_FULL.  Would that be a reasonable approach?
> 
> Yes but why do we do this raw_spinlock_t here? It is not yet needed on
> v5.6-RT as I *did* check. It also complicates the code for !RT but
> nobody responded to that part but…

I did respond by pointing out that the essentially similar call_rcu()
function ends up being invoked pretty much everywhere, including early
boot before rcu_init() has been invoked.  It is therefore only reasonable
to assume that there will be a need for kfree_rcu() to tolerate a similar
range of calling contexts.

> That said: the current memory allocation is the problem here. The
> remaining part is fine. The part under the lock is small enough so it
> should not cause the trouble if it invokes queue_work() which will
> "only" enqueue the timer. 

To your point, the small memory allocation will be going away.  The
memory allocations will pull in 4K pages of pointers.

On the timer, are you thinking of the queue_work() calls or instead of
the queue_delayed_work() calls?

> Side question: Is there any real-life workloads that benefits from this?
> I'm asking because rcuperf allocates the kfree_rcu() the pointer right
> away. The chances are high that the pointer are fed from the same page.
> SLUB's build_detached_freelist() scans the page of RCU's pointers to
> ensure that they are in the same page and then slab_free() them in one
> go. There is lookahead = 3 so it finds three different pages it stops
> further scanning and does slab_free() with what it found so far.
> 
> Which means if your kfree_rcu() collects random pointer from the system,
> they may belong to different pages (especially if they are part of
> different "types").

It gets significantly better performance as it currently is due to
the reduced cache-miss rate scanning pointers in a page as opposed to
pointer-chasing through a series of rcu_head pointers.

Yes, it might be even better if kfree_rcu() further sorted the freed
objects per slab or maybe even per page within a slab, but one step at
a time!  For one thing, it is not hard to imagine situations where this
further sorting actually slowed things down, especially if the system
was under any sort of memory pressure or if the kfree_rcu() calls were
scattered across so many slabs and pages that it essentially reverted
back to pointer chasing.

Make sense, or am I missing your point?

							Thanx, Paul



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux