Re: [PATCH v4 3/3] libceph: just wait for more data to be available on the socket

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 1/23/24 03:41, Ilya Dryomov wrote:
On Mon, Jan 22, 2024 at 6:14 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
On Mon, 2024-01-22 at 17:55 +0100, Ilya Dryomov wrote:
On Mon, Jan 22, 2024 at 4:02 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
On Thu, 2024-01-18 at 18:50 +0800, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>

The messages from ceph maybe split into multiple socket packages
and we just need to wait for all the data to be availiable on the
sokcet.

This will add 'sr_total_resid' to record the total length for all
data items for sparse-read message and 'sr_resid_elen' to record
the current extent total length.

URL: https://tracker.ceph.com/issues/63586
Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
  include/linux/ceph/messenger.h |  1 +
  net/ceph/messenger_v1.c        | 32 +++++++++++++++++++++-----------
  2 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h
index 2eaaabbe98cb..ca6f82abed62 100644
--- a/include/linux/ceph/messenger.h
+++ b/include/linux/ceph/messenger.h
@@ -231,6 +231,7 @@ struct ceph_msg_data {

  struct ceph_msg_data_cursor {
       size_t                  total_resid;    /* across all data items */
+     size_t                  sr_total_resid; /* across all data items for sparse-read */

       struct ceph_msg_data    *data;          /* current data item */
       size_t                  resid;          /* bytes not yet consumed */
diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c
index 4cb60bacf5f5..2733da891688 100644
--- a/net/ceph/messenger_v1.c
+++ b/net/ceph/messenger_v1.c
@@ -160,7 +160,9 @@ static size_t sizeof_footer(struct ceph_connection *con)
  static void prepare_message_data(struct ceph_msg *msg, u32 data_len)
  {
       /* Initialize data cursor if it's not a sparse read */
-     if (!msg->sparse_read)
+     if (msg->sparse_read)
+             msg->cursor.sr_total_resid = data_len;
+     else
               ceph_msg_data_cursor_init(&msg->cursor, msg, data_len);
  }


Sorry, disregard my last email.

I went and looked at the patch in the tree, and I better understand what
you're trying to do. We already have a value that gets set to data_len
called total_resid, but that is a nonsense value in a sparse read,
because we can advance farther through the receive buffer than the
amount of data in the socket.
Hi Jeff,

I see that total_resid is set to sparse_data_requested(), which is
effectively the size of the receive buffer, not data_len.  (This is
ignoring the seemingly unnecessary complication of trying to support
normal reads mixed with sparse reads in the same message, which I'm
pretty sure doesn't work anyway.)

Oh right. I missed that bit when I was re-reviewing this. Once we're in
a sparse read we'll override that value. Ok, so maybe we don't need
sr_total_resid.

Oh, I get you now. Yeah we can just reuse the 'total_resid' instead of adding a new one.

With that, total_resid should represent the amount that needs to be
filled in (advanced through) in the receive buffer.  When total_resid
reaches 0, wouldn't that mean that the amount of data in the socket is
also 0?  If not, where would the remaining data in the socket go?

With a properly formed reply, then I think yes, there should be no
remaining data in the socket at the end of the receive.
There would be no actual data, but a message footer which follows the
data section and ends the message would be outstanding.

Yeah, correct.

At this point I think I must just be confused about the actual problem.
I think I need a detailed description of it before I can properly review
this patch.
AFAIU the problem is that a short read may occur while reading the
message footer from the socket.  Later, when the socket is ready for
another read, the messenger invokes all read_partial* handlers,
including read_partial_sparse_msg_data().  The contract between the
messenger and these handlers is that the handler should bail if the
area of the message it's responsible for is already processed.  So,
in this case, it's expected that read_sparse_msg_data() would bail,
allowing the messenger to invoke read_partial() for the footer and
pick up where it left off.

However read_sparse_msg_data() violates that contract and ends up
calling into the state machine in the OSD client.  The state machine
assumes that it's a new op and interprets some piece of the footer (or
even completely random memory?) as the sparse-read header and returns
bogus extent length, etc.

(BTW it's why I suggested the rename from read_sparse_msg_data() to
read_partial_sparse_msg_data() in another patch -- to highlight that
it's one of the "partial" handlers and the respective behavior.)

Yeah, Ilya is correct.

Thanks

- Xiubo

Thanks,

                 Ilya






[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux