On Fri, 23 Aug 2024, Linus Torvalds wrote: > On Fri, 23 Aug 2024 at 08:14, NeilBrown <neilb@xxxxxxx> wrote: > > > > Maybe inode_wake_up_bit() should not have a barrier and the various > > callers should add whichever barrier is appropriate. That is the model > > that Linus prefers for wake_up_bit() and for consistency it should apply > > to inode_wake_up_bit() too. > > I think that for the wake_up_bit() cases in particular, it's fairly > easy to add whatever "clear_and_wakeup()" helpers. After all, there's > only so much you can do to one bit, and I think merging the "act" with > the "wakeup" into one routine is not only going to help fix the memory > ordering problem, I think it's going to result in more readable code > even aside from any memory ordering issues. That patterns that don't easily fit a helper include: - skipping the wake_up when there is no hurry. nfs_page_clear_headlock() does this is PG_CONTENDED1 isn't set. intel_recv_event() does something very similar if STATE_FIRMWARE_LOADED isn't set. I imagine changing these to if (need a wakeup) clear_bit_and_wakeup(...) else clear_bit() rpc_make_runnable() has three case and maybe each would need a separate clear_bit in order to have a clear_bit adjacent to the wake_up(). That might actually be a good thing, I haven't really tried yet. - key_reject_and_link() separate the test_and_clear from the wake_up_bit() for reasons that aren't completely obvious to me. __key_instantiate_and_link() does the same. Maybe they just move the wake_up outside the mutex.... > > The wake_up_var() infrastructure that the inode code uses is a bit > more involved. Not only can the variable be anything at all (so the > operations you can do on it are obviously largely unbounded), but the > inode hack in particular then uses one thing for the actual variable, > and another thing for the address that is used to match up waits and > wakeups. > > So I suspect the inode code will have to do its thing explcitly with > the low-level helpers and deal with the memory ordering issues on its > own, adding the smp_mb() manually. The problem here is that "wake_up_var()" doesn't *look* like a low-level helper. Maybe we could replace the few remaining instances with __wake_up_var() once the easy cases are fixed?? Thanks, NeilBrown