Hi David, [+Peter] I have a few drive-by comments on the ordering side of things. See below. On Fri, Sep 13, 2019 at 02:00:39PM +0100, David Howells wrote: > Convert pipes to use head and tail pointers for the buffer ring rather than > pointer and length as the latter requires two atomic ops to update (or a > combined op) whereas the former only requires one. > > (1) The head pointer is the point at which production occurs and points to > the slot in which the next buffer will be placed. This is equivalent > to pipe->curbuf + pipe->nrbufs. > > The head pointer belongs to the write-side. > > (2) The tail pointer is the point at which consumption occurs. It points > to the next slot to be consumed. This is equivalent to pipe->curbuf. > > The tail pointer belongs to the read-side. > > (3) head and tail are allowed to run to UINT_MAX and wrap naturally. They > are only masked off when the array is being accessed, e.g.: > > pipe->bufs[head & mask] > > This means that it is not necessary to have a dead slot in the ring as > head == tail isn't ambiguous. > > (4) The ring is empty if head == tail. > > (5) The occupancy of the ring is head - tail. > > (6) The amount of space in the ring is (tail + pipe->buffers) - head. > > (7) The ring is full if head == (tail + pipe->buffers) or > head - tail == pipe->buffers. > > Barriers are also inserted, wrapped in inline functions: > > (1) unsigned int tail = pipe_get_tail_for_write(pipe); > > Read the tail pointer from the write-side. This acts as a barrier to > order the tail read before the data in the ring is overwritten. It > also prevents the compiler from re-reading the pointer. > > (2) unsigned int head = pipe_get_head_for_read(pipe); > > Read the head pointer from the read-side. This acts as a barrier to > order the head read before the data read. It also prevents the > compiler from re-reading the pointer. > > (3) pipe_post_read_barrier(pipe, unsigned int tail); > > Update the tail pointer from the read-side. This acts as a barrier to > order the pointer update after the data read. The consumed data slot > must not be touched after this function. > > (4) pipe_post_write_barrier(pipe, unsigned int head); > > Update the head pointer from the write-side. This acts as a barrier > to order the pointer update after the data write. The produced data > slot should not normally be touched after this function[*]. > > [*] Unless pipe->mutex is held. > --- > fs/fuse/dev.c | 23 ++- > fs/pipe.c | 154 ++++++++++++++++---------- > fs/splice.c | 161 +++++++++++++++++---------- > include/linux/pipe_fs_i.h | 76 ++++++++++++- > include/linux/uio.h | 4 > lib/iov_iter.c | 268 ++++++++++++++++++++++++++-------------------- > 6 files changed, 438 insertions(+), 248 deletions(-) [...] > diff --git a/fs/pipe.c b/fs/pipe.c > index 8a2ab2f974bd..aa410ee0f423 100644 > --- a/fs/pipe.c > +++ b/fs/pipe.c > @@ -43,10 +43,10 @@ unsigned long pipe_user_pages_hard; > unsigned long pipe_user_pages_soft = PIPE_DEF_BUFFERS * INR_OPEN_CUR; > > /* > - * We use a start+len construction, which provides full use of the > + * We use a start+len construction, which provides full use of the > * allocated memory. > * -- Florian Coosmann (FGC) > - * > + * > * Reads with count = 0 should always return 0. > * -- Julian Bradfield 1999-06-07. > * > @@ -285,10 +285,15 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) > ret = 0; > __pipe_lock(pipe); > for (;;) { > - int bufs = pipe->nrbufs; > - if (bufs) { > - int curbuf = pipe->curbuf; > - struct pipe_buffer *buf = pipe->bufs + curbuf; > + /* Barrier: head belongs to the write side, so order reading > + * the data after reading the head pointer. > + */ > + unsigned int head = READ_ONCE(pipe->head); Hmm, I don't understand this. Since READ_ONCE() doesn't imply a barrier, how are you enforcing the read-read ordering in the CPU? > @@ -104,6 +104,76 @@ struct pipe_buf_operations { > bool (*get)(struct pipe_inode_info *, struct pipe_buffer *); > }; > > +/** > + * pipe_get_tail_for_write - Get pipe buffer tail pointer for write-side use > + * @pipe: The pipe in question > + * > + * Get the tail pointer for use in the write-side code. This may need to > + * insert a barrier against the reader to order reading the tail pointer > + * against the reader reading the buffer. What is the purpose of saying "This may need to insert a barrier"? Can this function be overridden or something? > + */ > +static inline unsigned int pipe_get_tail_for_write(struct pipe_inode_info *pipe) > +{ > + return READ_ONCE(pipe->tail); > +} > + > +/** > + * pipe_post_read_barrier - Set pipe buffer tail pointer in the read-side > + * @pipe: The pipe in question > + * @tail: The new tail pointer > + * > + * Update the tail pointer in the read-side code. This inserts a barrier > + * against the writer such that the data write is ordered before the tail > + * pointer update. > + */ > +static inline void pipe_post_read_barrier(struct pipe_inode_info *pipe, > + unsigned int tail) > +{ > + smp_store_release(&pipe->tail, tail); > +} > + > +/** > + * pipe_get_head_for_read - Get pipe buffer head pointer for read-side use > + * @pipe: The pipe in question > + * > + * Get the head pointer for use in the read-side code. This inserts a barrier > + * against the reader such that the head pointer is read before the data it > + * points to. > + */ > +static inline unsigned int pipe_get_head_for_read(struct pipe_inode_info *pipe) > +{ > + return READ_ONCE(pipe->head); > +} Saying that "This inserts a barrier" feels misleading, because READ_ONCE() doesn't do that. Will