Re: [PATCH V2 7/7] x86,rcu: use percpu rcu_preempt_depth

Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx> · Mon, 4 Nov 2019 19:41:20 +0800

On 2019/11/4 5:25 下午, Sebastian Andrzej Siewior wrote:
On 2019-11-02 12:45:59 [+0000], Lai Jiangshan wrote:
Convert x86 to use a per-cpu rcu_preempt_depth. The reason for doing so
is that accessing per-cpu variables is a lot cheaper than accessing
task_struct or thread_info variables.

Is there a benchmark saying how much we gain from this?

Hello

Maybe I can write a tight loop for testing, but I don't
think anyone will be interesting in it.

I'm also trying to find some good real tests. I need
some suggestions here.

We need to save/restore the actual rcu_preempt_depth when switch.
We also place the per-cpu rcu_preempt_depth close to __preempt_count
and current_task variable.

Using the idea of per-cpu __preempt_count.

No function call when using rcu_read_[un]lock().
Single instruction for rcu_read_lock().
2 instructions for fast path of rcu_read_unlock().

I think these were not inlined due to the header requirements.

objdump -D -S kernel/workqueue.o shows (selected fractions):

        raw_cpu_add_4(__rcu_preempt_depth, 1);
     d8f:       65 ff 05 00 00 00 00    incl   %gs:0x0(%rip)        # 
d96 <work_busy+0x16>

......

        return GEN_UNARY_RMWcc("decl", __rcu_preempt_depth, e, 
__percpu_arg([var]));
     dd8:       65 ff 0d 00 00 00 00    decl   %gs:0x0(%rip)        # 
ddf <work_busy+0x5f>
        if (unlikely(rcu_preempt_depth_dec_and_test()))
     ddf:       74 26                   je     e07 <work_busy+0x87>

......

                rcu_read_unlock_special();
     e07:       e8 00 00 00 00          callq  e0c <work_busy+0x8c>

Boris pointed one thing, there is also DEFINE_PERCPU_RCU_PREEMP_DEPTH.

Thanks for pointing out.

Best regards
Lai