On Wed, Mar 1, 2023 at 1:08 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > On Tue, 28 Feb 2023 18:30:14 -0500 > Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > > > > > But looking at this use case, I'd actually NAK it, as it is misleading. > > > > I'm trying to parse this. You are saying it is misleading, because it > > updates js when it doesn't need to? > > Correct. I'm a bit late to the discussion (well, I have to sleep from time to time, too), but in the hope that everybody interested in this issue will find the reply, I'll try to clarify the "updates" claim: The try_cmpxchg is written in such a way that benefits loops as well as linear code, in the latter case it depends on the compiler to eliminate the dead assignment. When changing linear code from cmpxchg to try_cmpxchg, one has to take care that the variable, passed by reference, is unused after cmpxchg, so it can be considered as a temporary variable (as said elsewhere, the alternative is to copy the value to a local temporary variable and pass the pointer to this variable to try_cmpxchg - the compiler will eliminate the assignment if the original variable is unused). Even in linear code, the conversion from cmpxchg to try_cmpxchg is able to eliminate assignment and compare, as can be seen when the code is compiled with gcc-10.3.1: a1c5: 0f 84 53 03 00 00 je a51e <rcu_sched_clock_irq+0x70e> a1cb: 48 89 c8 mov %rcx,%rax a1ce: f0 48 0f b1 35 00 00 lock cmpxchg %rsi,0x0(%rip) # a1d7 <rcu_sched_clock_irq+0x3c7> a1d5: 00 00 a1d3: R_X86_64_PC32 .data+0xf9c a1d7: 48 39 c1 cmp %rax,%rcx a1da: 0f 85 3e 03 00 00 jne a51e <rcu_sched_clock_irq+0x70e> to: a1d0: 0f 84 49 03 00 00 je a51f <rcu_sched_clock_irq+0x70f> a1d6: f0 48 0f b1 35 00 00 lock cmpxchg %rsi,0x0(%rip) # a1df <rcu_sched_clock_irq+0x3cf> a1dd: 00 00 a1db: R_X86_64_PC32 .data+0xf9c a1df: 0f 85 3a 03 00 00 jne a51f <rcu_sched_clock_irq+0x70f> Newer compilers (e.g. gcc-12+) are able to use likely/unlikely annotations to reorder the code, so the change is less visible. But due to reordering, even targets that don't define try_cmpxchg natively benefit from the change, please see thread at [1]. These benefits are the reason the change to try_cmpxchg was accepted also in the linear code elsewhere in the linux kernel, e.g. [2,3] to name a few commits, with a thumbs-up and a claim that the new code is actually *clearer* at the merge commit [4]. I really think that the above demonstrates various improvements, and would be unfortunate not to consider them. [1] https://lore.kernel.org/lkml/871qwgmqws.fsf@xxxxxxxxxxxxxxxxxx/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4e1da8fe031303599e78f88e0dad9f44272e4f99 [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8baceabca656d5ef4494cdeb3b9b9fbb844ac613 [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=91bc559d8d3aed488b4b50e9eba1d7ebb1da7bbf Uros. > > > > > As try_cmpxchg() is used to get rid of the updating of the old value. As in > > > the ring buffer code we had: > > > > > > void ring_buffer_record_off(struct trace_buffer *buffer) > > > { > > > unsigned int rd; > > > unsigned int new_rd; > > > > > > do { > > > rd = atomic_read(&buffer->record_disabled); > > > new_rd = rd | RB_BUFFER_OFF; > > > } while (!atomic_cmpxchg(&buffer->record_disabled, &rd, new_rd) != rd); > > > > Hear you actually meant "rd" as the second parameter without the & ? > > Yes, I cut and pasted the updated code and incorrectly try to revert it in > this example :-p > > > > > > } > > > > > > and the try_cmpxchg() converted it to: > > > > > > void ring_buffer_record_off(struct trace_buffer *buffer) > > > { > > > unsigned int rd; > > > unsigned int new_rd; > > > > > > rd = atomic_read(&buffer->record_disabled); > > > do { > > > new_rd = rd | RB_BUFFER_OFF; > > > } while (!atomic_try_cmpxchg(&buffer->record_disabled, &rd, new_rd)); > > > } > > > > > > Which got rid of the need to constantly update the rd variable (cmpxchg > > > will load rax with the value read, so it removes the need for an extra > > > move). > > > > So that's a good thing? > > Yes. For looping, try_cmpxchg() is the proper function to use. But in the > RCU case (and other cases in the ring-buffer patch) there is no loop, and > no need to modify the "old" variable. > > > > > > > > > But in your case, we don't need to update js, in which case the > > > try_cmpxchg() does. > > > > Right, it has lesser value here but I'm curious why you feel it also > > doesn't belong in that ring buffer loop you shared (or did you mean, > > it does belong there but not in other ftrace code modified by Uros?). > > The ring buffer patch had more than one change, where half the updates were > fine, and half were not. > > -- Steve