On 2019-11-04 19:41:20 [+0800], Lai Jiangshan wrote: > > Is there a benchmark saying how much we gain from this? > > Hello > > Maybe I can write a tight loop for testing, but I don't > think anyone will be interesting in it. > > I'm also trying to find some good real tests. I need > some suggestions here. There is rcutorture but I don't know how much of performance test this is, Paul would know. A micro benchmark is one thing. Any visible changes in userland to workloads like building a kernel or hackbench? I don't argue that incrementing a per-CPU variable is more efficient than reading a per-CPU variable, adding an offset and then incrementing it. I was just curious to see if there are any numbers on it. > > > No function call when using rcu_read_[un]lock(). > > > Single instruction for rcu_read_lock(). > > > 2 instructions for fast path of rcu_read_unlock(). > > > > I think these were not inlined due to the header requirements. > > objdump -D -S kernel/workqueue.o shows (selected fractions): That was not what I meant. To inline current rcu_read_lock() would mean to include definition for struct task_struct (and everything down the road) in the rcu headers which isn't working. > Best regards > Lai Sebastian