On Mon, Jan 22, 2024 at 6:14 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > On Mon, 2024-01-22 at 17:55 +0100, Ilya Dryomov wrote: > > On Mon, Jan 22, 2024 at 4:02 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > On Thu, 2024-01-18 at 18:50 +0800, xiubli@xxxxxxxxxx wrote: > > > > From: Xiubo Li <xiubli@xxxxxxxxxx> > > > > > > > > The messages from ceph maybe split into multiple socket packages > > > > and we just need to wait for all the data to be availiable on the > > > > sokcet. > > > > > > > > This will add 'sr_total_resid' to record the total length for all > > > > data items for sparse-read message and 'sr_resid_elen' to record > > > > the current extent total length. > > > > > > > > URL: https://tracker.ceph.com/issues/63586 > > > > Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx> > > > > --- > > > > include/linux/ceph/messenger.h | 1 + > > > > net/ceph/messenger_v1.c | 32 +++++++++++++++++++++----------- > > > > 2 files changed, 22 insertions(+), 11 deletions(-) > > > > > > > > diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h > > > > index 2eaaabbe98cb..ca6f82abed62 100644 > > > > --- a/include/linux/ceph/messenger.h > > > > +++ b/include/linux/ceph/messenger.h > > > > @@ -231,6 +231,7 @@ struct ceph_msg_data { > > > > > > > > struct ceph_msg_data_cursor { > > > > size_t total_resid; /* across all data items */ > > > > + size_t sr_total_resid; /* across all data items for sparse-read */ > > > > > > > > struct ceph_msg_data *data; /* current data item */ > > > > size_t resid; /* bytes not yet consumed */ > > > > diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c > > > > index 4cb60bacf5f5..2733da891688 100644 > > > > --- a/net/ceph/messenger_v1.c > > > > +++ b/net/ceph/messenger_v1.c > > > > @@ -160,7 +160,9 @@ static size_t sizeof_footer(struct ceph_connection *con) > > > > static void prepare_message_data(struct ceph_msg *msg, u32 data_len) > > > > { > > > > /* Initialize data cursor if it's not a sparse read */ > > > > - if (!msg->sparse_read) > > > > + if (msg->sparse_read) > > > > + msg->cursor.sr_total_resid = data_len; > > > > + else > > > > ceph_msg_data_cursor_init(&msg->cursor, msg, data_len); > > > > } > > > > > > > > > > > > > > Sorry, disregard my last email. > > > > > > I went and looked at the patch in the tree, and I better understand what > > > you're trying to do. We already have a value that gets set to data_len > > > called total_resid, but that is a nonsense value in a sparse read, > > > because we can advance farther through the receive buffer than the > > > amount of data in the socket. > > > > Hi Jeff, > > > > I see that total_resid is set to sparse_data_requested(), which is > > effectively the size of the receive buffer, not data_len. (This is > > ignoring the seemingly unnecessary complication of trying to support > > normal reads mixed with sparse reads in the same message, which I'm > > pretty sure doesn't work anyway.) > > > > Oh right. I missed that bit when I was re-reviewing this. Once we're in > a sparse read we'll override that value. Ok, so maybe we don't need > sr_total_resid. > > > With that, total_resid should represent the amount that needs to be > > filled in (advanced through) in the receive buffer. When total_resid > > reaches 0, wouldn't that mean that the amount of data in the socket is > > also 0? If not, where would the remaining data in the socket go? > > > > With a properly formed reply, then I think yes, there should be no > remaining data in the socket at the end of the receive. There would be no actual data, but a message footer which follows the data section and ends the message would be outstanding. > > At this point I think I must just be confused about the actual problem. > I think I need a detailed description of it before I can properly review > this patch. AFAIU the problem is that a short read may occur while reading the message footer from the socket. Later, when the socket is ready for another read, the messenger invokes all read_partial* handlers, including read_partial_sparse_msg_data(). The contract between the messenger and these handlers is that the handler should bail if the area of the message it's responsible for is already processed. So, in this case, it's expected that read_sparse_msg_data() would bail, allowing the messenger to invoke read_partial() for the footer and pick up where it left off. However read_sparse_msg_data() violates that contract and ends up calling into the state machine in the OSD client. The state machine assumes that it's a new op and interprets some piece of the footer (or even completely random memory?) as the sparse-read header and returns bogus extent length, etc. (BTW it's why I suggested the rename from read_sparse_msg_data() to read_partial_sparse_msg_data() in another patch -- to highlight that it's one of the "partial" handlers and the respective behavior.) Thanks, Ilya