On Tue, Sep 22, 2020 at 08:11:04AM +1000, Herbert Xu wrote: > On Mon, Sep 21, 2020 at 08:27:14AM -0700, Paul E. McKenney wrote: > > On Mon, Sep 21, 2020 at 06:19:39PM +1000, Herbert Xu wrote: > > > On Thu, Sep 17, 2020 at 09:58:02AM -0700, Eric Biggers wrote: > > > > > > > > smp_load_acquire() is obviously correct, whereas READ_ONCE() is an optimization > > > > that is difficult to tell whether it's correct or not. For trivial data > > > > structures it's "easy" to tell. But whenever there is a->b where b is an > > > > internal implementation detail of another kernel subsystem, the use of which > > > > could involve accesses to global or static data (for example, spin_lock() > > > > accessing lockdep stuff), a control dependency can slip in. > > > > > > If we're going to follow this line of reasoning, surely you should > > > be converting the RCU derference first and foremost, no? > > ... > > > And to Eric's point, it is also true that when you have pointers to > > static data, and when the compiler can guess this, you do need something > > like smp_load_acquire(). But this is a problem only when you are (1) > > using feedback-driven compiler optimization or (2) when you compare the > > pointer to the address of the static data. > > Let me restate what I think Eric is saying. He is concerned about > the case where a->b and b is some opaque object that may in turn > dereference a global data structure unconnected to a. The case > in question here is crng_node_pool in drivers/char/random.c which > in turn contains a spin lock. As long as the compiler generates code that reaches that global via pointer a, everything will work fine. Which it will, unless the guy writing the code makes the mistake of introducing a comparison between the pointer to be dereferenced and the address of the global data structure. So this is OK: p = rcu_dereference(a); do_something(p->b); This is not OK: p = rcu_dereference(a); if (p == &some_global_variable) we_really_should_not_have_done_that_comparison(); do_something(p->b); The reason this last is not OK is because the compiler can transform it as follows: p = rcu_dereference(a); if (p == &some_global_variable) { we_really_should_not_have_done_that_comparison(); do_something(some_global_variable.b); } else { do_something(p->b); } The compiler is not allowed to make up that sort of comparison, based on my February 2020 discussion with the standards committee. > But this reasoning could apply to any data structure that contains > a spin lock, in particular ones that are dereferenced through RCU. I lost you on this one. What is special about a spin lock? Here is what I think you mean: struct foo { spinlock_t lock; int a; char b; long c; }; struct foo *a; ... p = rcu_dereference(a); BUG_ON(!p); if (is_this_the_one(p)) { spin_lock(p->lock); do_something_else(p); spin_unlock(p->lock); } This should be fine. Or were you thinking of some other example? > So my question if this reasoning is valid, then why aren't we first > converting rcu_dereference to use smp_load_acquire? For LTO in ARM, rumor has it that Will is doing so. Which was what motivated the BoF on this topic at Linux Plumbers Conference. Thanx, Paul