Re: [PATCH V2 7/7] x86,rcu: use percpu rcu_preempt_depth

Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> · Mon, 4 Nov 2019 13:09:39 +0100

On 2019-11-04 19:41:20 [+0800], Lai Jiangshan wrote:
> > Is there a benchmark saying how much we gain from this?
> 
> Hello
> 
> Maybe I can write a tight loop for testing, but I don't
> think anyone will be interesting in it.
> 
> I'm also trying to find some good real tests. I need
> some suggestions here.

There is rcutorture but I don't know how much of performance test this
is, Paul would know.

A micro benchmark is one thing. Any visible changes in userland to
workloads like building a kernel or hackbench? 

I don't argue that incrementing a per-CPU variable is more efficient
than reading a per-CPU variable, adding an offset and then incrementing
it. I was just curious to see if there are any numbers on it.

> > > No function call when using rcu_read_[un]lock().
> > > Single instruction for rcu_read_lock().
> > > 2 instructions for fast path of rcu_read_unlock().
> > 
> > I think these were not inlined due to the header requirements.
> 
> objdump -D -S kernel/workqueue.o shows (selected fractions):

That was not what I meant. To inline current rcu_read_lock() would mean
to include definition for struct task_struct (and everything down the
road) in the rcu headers which isn't working.

> Best regards
> Lai

Sebastian