On Wed, Sep 18, 2019 at 8:43 AM David Howells <dhowells@xxxxxxxxxx> wrote: > > It mandates using smp_store_release() to update buffer->head in the producer > and buffer->tail in the consumer - but these need pairing with memory barriers > used when reading buffer->head and buffer->tail on the other side. No, the rule with smp_store_release() should be that it's paired with "smp_load_acquire()". No other barriers needed. If you do that thread #1 thread #2 ... add data to queue .. smp_store_release(x) smp_load_acquire(x) ... read data from queue .. then you should need no other barriers. But yes, store_release(x) should always pair with the load_acquire(x), and the guarantee is that if the load_acquire reads the value that the store_release stored, then all subsequent reads in thread #2 will see all preceding writes in thread #1. That's the optimal locking for a simple queue with a reader and a writer and no other locking needed between the two. HOWEVER. I think this is all entirely pointless wrt the pipe buffer use. You don't have a simple queue. You have multiple writers, and you have multiple readers. As a result, you need real locking. So don't do the barriers. If you do the barriers, you're almost certainly doing something buggy. You don't have the simple "consumer -> producer" thing. Or rather, you don't _only_ have that thing. A reader "produces" a new tail as far as the writer is concerned (so the reader that has consumed an entry does a smp_store_release(tail) -> smp_load_acquire(tail) on the writer side when the writer looks for a new entry it can fill). BUT! A writer also needs to make sure it's the *only* writer for that new entry, and similarly a reader that is about to consume an entry needs to make sure it's the only reader of that entry. So it is *not* that kind of simple hand-off situation at all. End result: use a lock. Seriously. Anything else is going to be buggy, or going to be incredibly subtle. Don't do those barriers. They are wrong. Barriers simply do not protect against two concurrent readers, or two concurrent writers. Barriers are useless. Now, it's possible that barriers together with very clever atomics could do a lockless model, but then all the cleverness is going to be in the lockless part of the code, not the barriers. So even if we eventually do a lockless model, it's completely wrong to do the barriers first. And no, lockless isn't as simple as "readers can do a atomic_add_return_relaxed() to consume an entry, and writers can do a atomic_cmpxchg() to produce one". I think the only possible lockless model is that readers have some exclusion against other readers (so now you have only one reader at a time), and then you fundamentally know that a reader can update the tail pointer without worrying about any races (because a reader will never race against another reader, and a writer will only ever _read_ the tail pointer). So then - once you have reader-reader exclusion, the _tail_ update is simple: smp_store_release(old_tail+1, &tail); because while there are multiple writers, the reader doesn't care (the above, btw, assumes that head/tail themselves aren't doing the "& bufsize" thing, and that that is done at use time - that also makes it easy to see the difference between empty and full). But this still requires that lock around all readers. And we have that: the pipe mutex. But writers have the same issue, and we don't want to use the pipe mutex for writer exclusion, since we want to allow atomic writers. So writers do have to have a lock, or they need to do something clever with a reservation system that uses cmpxchg. Honestly, just do the lock. The way this should be done: 1) do the head/tail conversion using the EXISTING locking. We have the pipe_mutex, and it serializes everybody right now, and it makes any barriers or atomics pointless. Just covert to a sane and simple head/tail model. NO BARRIERS! *after* you've done this, do the next phase: 2) convert the writers to use an irq-safe spinlock, and either make the readers look at it too, or do the above smp_store_release() for the final tail update (and the writers need to use a smp_load_acquire() to look at the tail, despite their lock). NO BARRIERS (well, outside the tail thing above). and *after* you've done #2, you can now do writes from atomic context. At that point you are done. Consider youself happy, but also allow for the possibility that some day you'll see lock contention with lots of writers, and that *maybe* some day (unlikely) you'd want to look at doing 3) see if you can make the writers do some clever lockless reservation model. But that #3 really should be considered a "we are pretty sure it's possible in theory, but it's questinably useful, and hopefully we'll never even need to consider it". Linus