On Thu, Sep 19, 2019 at 6:59 AM David Howells <dhowells@xxxxxxxxxx> wrote: > > But I don't agree with this. You're missing half the barriers. There should > be *four* barriers. The document mandates only 3 barriers, and uses > READ_ONCE() where the fourth should be, i.e.: > > thread #1 thread #2 > > smp_load_acquire(head) > ... read data from queue .. > smp_store_release(tail) > > READ_ONCE(tail) > ... add data to queue .. > smp_store_release(head) The document is right, but you shouldn't do this. The reason that READ_ONCE() is possible - instead of a smp_load_acquire() - is that there's now an address dependency chain from the READ_ONCE to the subsequent writes of the data. And while there isn't any barrier, a data or control dependency to a _write_ does end up ordering things (even on alpha - it's only the read->read dependencies that might be unordered on alpha). But again, don't do this. Also, you ignored the part where I told you to not do this because we already have locking. I'm not goign to discuss this further. Locking works. Spinlocks are cheap. Lockless algorithms that need atomics aren't even cheaper than spinlocks: they can in fact scale *worse*, because they don't have the nice queuing optimization that our spinlock have. Lockless algorithms are great if they can avoid the contention on the lock and instead only work on distributed data and avoid contention entirely. But in this case the lock would be right next to the data anyway, so even that case doesn't hold. Linus