Re: [PATCH v4 3/3] libceph: just wait for more data to be available on the socket

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2024-01-19 at 12:35 +0800, Xiubo Li wrote:
> On 1/19/24 02:24, Jeff Layton wrote:
> > On Thu, 2024-01-18 at 18:50 +0800, xiubli@xxxxxxxxxx wrote:
> > > From: Xiubo Li <xiubli@xxxxxxxxxx>
> > > 
> > > The messages from ceph maybe split into multiple socket packages
> > > and we just need to wait for all the data to be availiable on the
> > > sokcet.
> > > 
> > > This will add 'sr_total_resid' to record the total length for all
> > > data items for sparse-read message and 'sr_resid_elen' to record
> > > the current extent total length.
> > > 
> > > URL: https://tracker.ceph.com/issues/63586
> > > Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
> > > ---
> > >   include/linux/ceph/messenger.h |  1 +
> > >   net/ceph/messenger_v1.c        | 32 +++++++++++++++++++++-----------
> > >   2 files changed, 22 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h
> > > index 2eaaabbe98cb..ca6f82abed62 100644
> > > --- a/include/linux/ceph/messenger.h
> > > +++ b/include/linux/ceph/messenger.h
> > > @@ -231,6 +231,7 @@ struct ceph_msg_data {
> > >   
> > >   struct ceph_msg_data_cursor {
> > >   	size_t			total_resid;	/* across all data items */
> > > +	size_t			sr_total_resid;	/* across all data items for sparse-read */
> > >   
> > >   	struct ceph_msg_data	*data;		/* current data item */
> > >   	size_t			resid;		/* bytes not yet consumed */
> > > diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c
> > > index 4cb60bacf5f5..2733da891688 100644
> > > --- a/net/ceph/messenger_v1.c
> > > +++ b/net/ceph/messenger_v1.c
> > > @@ -160,7 +160,9 @@ static size_t sizeof_footer(struct ceph_connection *con)
> > >   static void prepare_message_data(struct ceph_msg *msg, u32 data_len)
> > >   {
> > >   	/* Initialize data cursor if it's not a sparse read */
> > > -	if (!msg->sparse_read)
> > > +	if (msg->sparse_read)
> > > +		msg->cursor.sr_total_resid = data_len;
> > > +	else
> > >   		ceph_msg_data_cursor_init(&msg->cursor, msg, data_len);
> > >   }
> > >   
> > > @@ -1032,35 +1034,43 @@ static int read_partial_sparse_msg_data(struct ceph_connection *con)
> > >   	bool do_datacrc = !ceph_test_opt(from_msgr(con->msgr), NOCRC);
> > >   	u32 crc = 0;
> > >   	int ret = 1;
> > > +	int len;
> > >   
> > >   	if (do_datacrc)
> > >   		crc = con->in_data_crc;
> > >   
> > > -	do {
> > > -		if (con->v1.in_sr_kvec.iov_base)
> > > +	while (cursor->sr_total_resid) {
> > > +		len = 0;
> > > +		if (con->v1.in_sr_kvec.iov_base) {
> > > +			len = con->v1.in_sr_kvec.iov_len;
> > >   			ret = read_partial_message_chunk(con,
> > >   							 &con->v1.in_sr_kvec,
> > >   							 con->v1.in_sr_len,
> > >   							 &crc);
> > > -		else if (cursor->sr_resid > 0)
> > > +			len = con->v1.in_sr_kvec.iov_len - len;
> > > +		} else if (cursor->sr_resid > 0) {
> > > +			len = cursor->sr_resid;
> > >   			ret = read_partial_sparse_msg_extent(con, &crc);
> > > -
> > > -		if (ret <= 0) {
> > > -			if (do_datacrc)
> > > -				con->in_data_crc = crc;
> > > -			return ret;
> > > +			len -= cursor->sr_resid;
> > >   		}
> > > +		cursor->sr_total_resid -= len;
> > > +		if (ret <= 0)
> > > +			break;
> > >   
> > >   		memset(&con->v1.in_sr_kvec, 0, sizeof(con->v1.in_sr_kvec));
> > >   		ret = con->ops->sparse_read(con, cursor,
> > >   				(char **)&con->v1.in_sr_kvec.iov_base);
> > > +		if (ret <= 0) {
> > > +			ret = ret ? : 1; /* must return > 0 to indicate success */
> > > +			break;
> > > +		}
> > >   		con->v1.in_sr_len = ret;
> > > -	} while (ret > 0);
> > > +	}
> > >   
> > >   	if (do_datacrc)
> > >   		con->in_data_crc = crc;
> > >   
> > > -	return ret < 0 ? ret : 1;  /* must return > 0 to indicate success */
> > > +	return ret;
> > >   }
> > >   
> > >   static int read_partial_msg_data(struct ceph_connection *con)
> > Looking back over this code...
> > 
> > The way it works today, once we determine it's a sparse read, we call
> > read_sparse_msg_data. At that point we call either
> > read_partial_message_chunk (to read into the kvec) or
> > read_sparse_msg_extent if sr_resid is already set (indicating that we're
> > receiving an extent).
> > 
> > read_sparse_msg_extent calls ceph_tcp_recvpage in a loop until
> > cursor->sr_resid have been received. The exception there when
> > ceph_tcp_recvpage returns <= 0.
> > 
> > ceph_tcp_recvpage returns 0 if sock_recvmsg returns -EAGAIN (maybe also
> > in other cases). So it sounds like the client just timed out on a read
> > from the socket or caught a signal or something?
> > 
> > If that's correct, then do we know what ceph_tcp_recvpage returned when
> > the problem happened?
> 
> It should just return parital data, and we should continue from here in 
> the next loop when the reset data comes.
> 

Tracking this extra length seems like the wrong fix. We're already
looping in read_sparse_msg_extent until the sr_resid goes to 0. ISTM
that it's just that read_sparse_msg_extent is returning inappropriately
in the face of timeouts.

IOW, it does this:

                ret = ceph_tcp_recvpage(con->sock, rpage, (int)off, len);
                if (ret <= 0)
                        return ret;

...should it just not be returning there when ret == 0? Maybe it should
be retrying the recvpage instead?
-- 
Jeff Layton <jlayton@xxxxxxxxxx>





[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux