On 16 Jul 2024, at 12:32, Trond Myklebust wrote: > So let's just replace rpc_clear_queued() with a call to > clear_bit_unlocked(). That still ends up being less expensive than the > full memory barrier, doesn't it? Yes. But unless you really want to go this direction, I'll keep testing your first suggestion to use smp_mb__after_atomic(). That should do the same thing but also allow us to skip the memory barrier for async tasks. Ben