Re: [RFC][PATCH] pipe: Convert ring to head/tail

Will Deacon <will@xxxxxxxxxx> · Sun, 15 Sep 2019 15:59:06 +0100

Hi David, [+Peter]

I have a few drive-by comments on the ordering side of things. See below.

On Fri, Sep 13, 2019 at 02:00:39PM +0100, David Howells wrote:
> Convert pipes to use head and tail pointers for the buffer ring rather than
> pointer and length as the latter requires two atomic ops to update (or a
> combined op) whereas the former only requires one.
> 
>  (1) The head pointer is the point at which production occurs and points to
>      the slot in which the next buffer will be placed.  This is equivalent
>      to pipe->curbuf + pipe->nrbufs.
> 
>      The head pointer belongs to the write-side.
> 
>  (2) The tail pointer is the point at which consumption occurs.  It points
>      to the next slot to be consumed.  This is equivalent to pipe->curbuf.
> 
>      The tail pointer belongs to the read-side.
> 
>  (3) head and tail are allowed to run to UINT_MAX and wrap naturally.  They
>      are only masked off when the array is being accessed, e.g.:
> 
>         pipe->bufs[head & mask]
> 
>      This means that it is not necessary to have a dead slot in the ring as
>      head == tail isn't ambiguous.
> 
>  (4) The ring is empty if head == tail.
> 
>  (5) The occupancy of the ring is head - tail.
> 
>  (6) The amount of space in the ring is (tail + pipe->buffers) - head.
> 
>  (7) The ring is full if head == (tail + pipe->buffers) or
>      head - tail == pipe->buffers.
> 
> Barriers are also inserted, wrapped in inline functions:
> 
>  (1) unsigned int tail = pipe_get_tail_for_write(pipe);
> 
>      Read the tail pointer from the write-side.  This acts as a barrier to
>      order the tail read before the data in the ring is overwritten.  It
>      also prevents the compiler from re-reading the pointer.
> 
>  (2) unsigned int head = pipe_get_head_for_read(pipe);
> 
>      Read the head pointer from the read-side.  This acts as a barrier to
>      order the head read before the data read.  It also prevents the
>      compiler from re-reading the pointer.
> 
>  (3) pipe_post_read_barrier(pipe, unsigned int tail);
> 
>      Update the tail pointer from the read-side.  This acts as a barrier to
>      order the pointer update after the data read.  The consumed data slot
>      must not be touched after this function.
> 
>  (4) pipe_post_write_barrier(pipe, unsigned int head);
> 
>      Update the head pointer from the write-side.  This acts as a barrier
>      to order the pointer update after the data write.  The produced data
>      slot should not normally be touched after this function[*].
> 
>      [*] Unless pipe->mutex is held.
> ---
>  fs/fuse/dev.c             |   23 ++-
>  fs/pipe.c                 |  154 ++++++++++++++++----------
>  fs/splice.c               |  161 +++++++++++++++++----------
>  include/linux/pipe_fs_i.h |   76 ++++++++++++-
>  include/linux/uio.h       |    4 
>  lib/iov_iter.c            |  268 ++++++++++++++++++++++++++--------------------
>  6 files changed, 438 insertions(+), 248 deletions(-)

[...]

> diff --git a/fs/pipe.c b/fs/pipe.c
> index 8a2ab2f974bd..aa410ee0f423 100644
> --- a/fs/pipe.c
> +++ b/fs/pipe.c
> @@ -43,10 +43,10 @@ unsigned long pipe_user_pages_hard;
>  unsigned long pipe_user_pages_soft = PIPE_DEF_BUFFERS * INR_OPEN_CUR;
>  
>  /*
> - * We use a start+len construction, which provides full use of the 
> + * We use a start+len construction, which provides full use of the
>   * allocated memory.
>   * -- Florian Coosmann (FGC)
> - * 
> + *
>   * Reads with count = 0 should always return 0.
>   * -- Julian Bradfield 1999-06-07.
>   *
> @@ -285,10 +285,15 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
>  	ret = 0;
>  	__pipe_lock(pipe);
>  	for (;;) {
> -		int bufs = pipe->nrbufs;
> -		if (bufs) {
> -			int curbuf = pipe->curbuf;
> -			struct pipe_buffer *buf = pipe->bufs + curbuf;
> +		/* Barrier: head belongs to the write side, so order reading
> +		 * the data after reading the head pointer.
> +		 */
> +		unsigned int head = READ_ONCE(pipe->head);

Hmm, I don't understand this. Since READ_ONCE() doesn't imply a barrier,
how are you enforcing the read-read ordering in the CPU?

> @@ -104,6 +104,76 @@ struct pipe_buf_operations {
>  	bool (*get)(struct pipe_inode_info *, struct pipe_buffer *);
>  };
>  
> +/**
> + * pipe_get_tail_for_write - Get pipe buffer tail pointer for write-side use
> + * @pipe: The pipe in question
> + *
> + * Get the tail pointer for use in the write-side code.  This may need to
> + * insert a barrier against the reader to order reading the tail pointer
> + * against the reader reading the buffer.

What is the purpose of saying "This may need to insert a barrier"? Can this
function be overridden or something?

> + */
> +static inline unsigned int pipe_get_tail_for_write(struct pipe_inode_info *pipe)
> +{
> +	return READ_ONCE(pipe->tail);
> +}
> +
> +/**
> + * pipe_post_read_barrier - Set pipe buffer tail pointer in the read-side
> + * @pipe: The pipe in question
> + * @tail: The new tail pointer
> + *
> + * Update the tail pointer in the read-side code.  This inserts a barrier
> + * against the writer such that the data write is ordered before the tail
> + * pointer update.
> + */
> +static inline void pipe_post_read_barrier(struct pipe_inode_info *pipe,
> +					  unsigned int tail)
> +{
> +	smp_store_release(&pipe->tail, tail);
> +}
> +
> +/**
> + * pipe_get_head_for_read - Get pipe buffer head pointer for read-side use
> + * @pipe: The pipe in question
> + *
> + * Get the head pointer for use in the read-side code.  This inserts a barrier
> + * against the reader such that the head pointer is read before the data it
> + * points to.
> + */
> +static inline unsigned int pipe_get_head_for_read(struct pipe_inode_info *pipe)
> +{
> +	return READ_ONCE(pipe->head);
> +}

Saying that "This inserts a barrier" feels misleading, because READ_ONCE()
doesn't do that.

Will