On Thu, Nov 07, 2024 at 02:01:17PM +0000, Zilin Guan wrote: > On Wed, Nov 06, 2024 at 12:18:25PM -0800, Paul E. McKenney wrote: > > Good eyes!!! > > > > But did you find this with KCSAN, or by visual inspection? > > > > The reason that I ask is that the __note_gp_changes() should be > > invoked with the leaf rnp->lock held, which should exclude writes to > > the rdp->gpwrap fields for all CPUs corresponding to that leaf rcu_node > > structure. > > > > Note the raw_lockdep_assert_held_rcu_node(rnp) call at the beginning of > > this function. > > > > So I believe that the proper fix is to *remove* READ_ONCE() from accesses > > to rdp->gpwrap in this function. > > > > Or am I missing something here? > > > > Thanx, Paul > > I found this by visual inspection. Good eyes! ;-) > When reviewing the function __note_gp_changes(), I noticed that other > accesses to rdp->gpwrap are protected with either READ_ONCE() or > WRITE_ONCE(), which led me to suspect a potential data race at line 1305. > > However, I am not certain whether holding rnp->lock protects access to > rdp->gpwrap in this case. If it indeed ensures that no concurrent writes > can occur, then I agree that the correct approach would be to remove > READ_ONCE() from those accesses. One way to check this is via inspection of all the updates to the ->gpwrap field. Another approach is to run KCSAN, for example, from the top-level directory of the Linux-kernel source tree on a system with qemu/KVM enabled: tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 30m --configs "4*TREE03" --kconfigs "CONFIG_NR_CPUS=4" --kcsan --trust-make This particular command is set up for my 16-CPU laptop. You can of course adjust the "4*" and the "=4" to match your hardware. For example, on a 64-CPU system you might instead do this: tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 30m --configs "8*TREE03" --kconfigs "CONFIG_NR_CPUS=8" --kcsan --trust-make Please see Documentation/dev-tools/kcsan.rst for information on how to interpret KCSAN reports. This will find false positives in the non-RCU portions of the kernel, so you should look for reports involving __note_gp_changes() and/or its callers (inlining and all that). So why not try it? ;-) Thanx, Paul