On Tue, Feb 28, 2023 at 9:39 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > On Tue, Feb 28, 2023 at 04:51:21PM +0100, Uros Bizjak wrote: > > Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in > > check_cpu_stall. x86 CMPXCHG instruction returns success in ZF flag, so > > this change saves a compare after cmpxchg (and related move instruction in > > front of cmpxchg). > > In my codegen, I am not seeing mov instruction before the cmp removed, how > can that be? The rax has to be populated with a mov before cmpxchg right? > > So try_cmpxchg gives: mov, cmpxchg, cmp, jne > Where as cmpxchg gives: mov, cmpxchg, mov, jne (I assume the above is reversed) You have quite a new compiler (e.g. gcc-12+) which is able to reorder basic blocks, reschedule instructions and create fast paths through the code based on likely/unlikely annotations in the definition of try_cmpxchg. [On a related note, some code growth is allowed to the compiler in the hot path since the kernel is compiled with -O2, not -Os.] gcc-10.3.1 improves the code from tree.c from: a1c5: 0f 84 53 03 00 00 je a51e <rcu_sched_clock_irq+0x70e> a1cb: 48 89 c8 mov %rcx,%rax a1ce: f0 48 0f b1 35 00 00 lock cmpxchg %rsi,0x0(%rip) # a1d7 <rcu_sched_clock_irq+0x3c7> a1d5: 00 00 a1d3: R_X86_64_PC32 .data+0xf9c a1d7: 48 39 c1 cmp %rax,%rcx a1da: 0f 85 3e 03 00 00 jne a51e <rcu_sched_clock_irq+0x70e> to: a1d0: 0f 84 49 03 00 00 je a51f <rcu_sched_clock_irq+0x70f> a1d6: f0 48 0f b1 35 00 00 lock cmpxchg %rsi,0x0(%rip) # a1df <rcu_sched_clock_irq+0x3cf> a1dd: 00 00 a1db: R_X86_64_PC32 .data+0xf9c a1df: 0f 85 3a 03 00 00 jne a51f <rcu_sched_clock_irq+0x70f> The change brings the following code size improvement: text data bss dec hex filename 57712 9945 86 67743 1089f tree-new.o 57760 9945 86 67791 108cf tree-old.o Small change, but the result of effectively an almost mechanical one-line substitutions. Uros. > So yeah you got rid of compare, but I am not seeing reduction in moves. > Either way, I think it is an improvement due to dropping cmp so: > > Acked-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx> > > thanks, > > - Joel > > > > > > No functional change intended. > > > > Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx> > > Cc: Frederic Weisbecker <frederic@xxxxxxxxxx> > > Cc: Neeraj Upadhyay <quic_neeraju@xxxxxxxxxxx> > > Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx> > > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx> > > Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > > Cc: Lai Jiangshan <jiangshanlai@xxxxxxxxx> > > Cc: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> > > Signed-off-by: Uros Bizjak <ubizjak@xxxxxxxxx> > > --- > > kernel/rcu/tree_stall.h | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h > > index b10b8349bb2a..d81c88e66b42 100644 > > --- a/kernel/rcu/tree_stall.h > > +++ b/kernel/rcu/tree_stall.h > > @@ -760,7 +760,7 @@ static void check_cpu_stall(struct rcu_data *rdp) > > jn = jiffies + ULONG_MAX / 2; > > if (rcu_gp_in_progress() && > > (READ_ONCE(rnp->qsmask) & rdp->grpmask) && > > - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { > > + try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) { > > > > /* > > * If a virtual machine is stopped by the host it can look to > > @@ -778,7 +778,7 @@ static void check_cpu_stall(struct rcu_data *rdp) > > > > } else if (rcu_gp_in_progress() && > > ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) && > > - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { > > + try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) { > > > > /* > > * If a virtual machine is stopped by the host it can look to > > -- > > 2.39.2 > >