Re: [PATCH v4 3/3] libceph: just wait for more data to be available on the socket

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 1/19/24 02:24, Jeff Layton wrote:
On Thu, 2024-01-18 at 18:50 +0800, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>

The messages from ceph maybe split into multiple socket packages
and we just need to wait for all the data to be availiable on the
sokcet.

This will add 'sr_total_resid' to record the total length for all
data items for sparse-read message and 'sr_resid_elen' to record
the current extent total length.

URL: https://tracker.ceph.com/issues/63586
Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
  include/linux/ceph/messenger.h |  1 +
  net/ceph/messenger_v1.c        | 32 +++++++++++++++++++++-----------
  2 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h
index 2eaaabbe98cb..ca6f82abed62 100644
--- a/include/linux/ceph/messenger.h
+++ b/include/linux/ceph/messenger.h
@@ -231,6 +231,7 @@ struct ceph_msg_data {
struct ceph_msg_data_cursor {
  	size_t			total_resid;	/* across all data items */
+	size_t			sr_total_resid;	/* across all data items for sparse-read */
struct ceph_msg_data *data; /* current data item */
  	size_t			resid;		/* bytes not yet consumed */
diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c
index 4cb60bacf5f5..2733da891688 100644
--- a/net/ceph/messenger_v1.c
+++ b/net/ceph/messenger_v1.c
@@ -160,7 +160,9 @@ static size_t sizeof_footer(struct ceph_connection *con)
  static void prepare_message_data(struct ceph_msg *msg, u32 data_len)
  {
  	/* Initialize data cursor if it's not a sparse read */
-	if (!msg->sparse_read)
+	if (msg->sparse_read)
+		msg->cursor.sr_total_resid = data_len;
+	else
  		ceph_msg_data_cursor_init(&msg->cursor, msg, data_len);
  }
@@ -1032,35 +1034,43 @@ static int read_partial_sparse_msg_data(struct ceph_connection *con)
  	bool do_datacrc = !ceph_test_opt(from_msgr(con->msgr), NOCRC);
  	u32 crc = 0;
  	int ret = 1;
+	int len;
if (do_datacrc)
  		crc = con->in_data_crc;
- do {
-		if (con->v1.in_sr_kvec.iov_base)
+	while (cursor->sr_total_resid) {
+		len = 0;
+		if (con->v1.in_sr_kvec.iov_base) {
+			len = con->v1.in_sr_kvec.iov_len;
  			ret = read_partial_message_chunk(con,
  							 &con->v1.in_sr_kvec,
  							 con->v1.in_sr_len,
  							 &crc);
-		else if (cursor->sr_resid > 0)
+			len = con->v1.in_sr_kvec.iov_len - len;
+		} else if (cursor->sr_resid > 0) {
+			len = cursor->sr_resid;
  			ret = read_partial_sparse_msg_extent(con, &crc);
-
-		if (ret <= 0) {
-			if (do_datacrc)
-				con->in_data_crc = crc;
-			return ret;
+			len -= cursor->sr_resid;
  		}
+		cursor->sr_total_resid -= len;
+		if (ret <= 0)
+			break;
memset(&con->v1.in_sr_kvec, 0, sizeof(con->v1.in_sr_kvec));
  		ret = con->ops->sparse_read(con, cursor,
  				(char **)&con->v1.in_sr_kvec.iov_base);
+		if (ret <= 0) {
+			ret = ret ? : 1; /* must return > 0 to indicate success */
+			break;
+		}
  		con->v1.in_sr_len = ret;
-	} while (ret > 0);
+	}
if (do_datacrc)
  		con->in_data_crc = crc;
- return ret < 0 ? ret : 1; /* must return > 0 to indicate success */
+	return ret;
  }
static int read_partial_msg_data(struct ceph_connection *con)
Looking back over this code...

The way it works today, once we determine it's a sparse read, we call
read_sparse_msg_data. At that point we call either
read_partial_message_chunk (to read into the kvec) or
read_sparse_msg_extent if sr_resid is already set (indicating that we're
receiving an extent).

read_sparse_msg_extent calls ceph_tcp_recvpage in a loop until
cursor->sr_resid have been received. The exception there when
ceph_tcp_recvpage returns <= 0.

ceph_tcp_recvpage returns 0 if sock_recvmsg returns -EAGAIN (maybe also
in other cases). So it sounds like the client just timed out on a read
from the socket or caught a signal or something?

If that's correct, then do we know what ceph_tcp_recvpage returned when
the problem happened?

It should just return parital data, and we should continue from here in the next loop when the reset data comes.

Thanks

- Xiubo







[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux