On Tue, Jan 16, 2024 at 5:09 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > > On 1/16/24 06:38, Ilya Dryomov wrote: > > On Fri, Dec 15, 2023 at 1:22 AM <xiubli@xxxxxxxxxx> wrote: > > From: Xiubo Li <xiubli@xxxxxxxxxx> > > The messages from ceph maybe split into multiple socket packages > and we just need to wait for all the data to be availiable on the > sokcet. > > This will add 'sr_total_resid' to record the total length for all > data items for sparse-read message and 'sr_resid_elen' to record > the current extent total length. > > URL: https://tracker.ceph.com/issues/63586 > Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx> > --- > include/linux/ceph/messenger.h | 2 ++ > net/ceph/messenger.c | 1 + > net/ceph/messenger_v1.c | 21 ++++++++++++++++----- > net/ceph/osd_client.c | 1 + > 4 files changed, 20 insertions(+), 5 deletions(-) > > diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h > index 2eaaabbe98cb..05e9b39a58f8 100644 > --- a/include/linux/ceph/messenger.h > +++ b/include/linux/ceph/messenger.h > @@ -231,10 +231,12 @@ struct ceph_msg_data { > > struct ceph_msg_data_cursor { > size_t total_resid; /* across all data items */ > + size_t sr_total_resid; /* across all data items for sparse-read */ > > struct ceph_msg_data *data; /* current data item */ > size_t resid; /* bytes not yet consumed */ > int sr_resid; /* residual sparse_read len */ > + int sr_resid_elen; /* total sparse_read elen */ > > Hi Xiubo, > > Is adding yet another sparse-read field to the cursor really needed? > Would making read_partial_sparse_msg_extent() decrement sr_total_resid > as it goes just like sr_resid is decremented work? > > This can be improved by removing it. Let me fix it. > > bool need_crc; /* crc update needed */ > union { > #ifdef CONFIG_BLOCK > diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c > index 3c8b78d9c4d1..eafd592af382 100644 > --- a/net/ceph/messenger.c > +++ b/net/ceph/messenger.c > @@ -1073,6 +1073,7 @@ void ceph_msg_data_cursor_init(struct ceph_msg_data_cursor *cursor, > cursor->total_resid = length; > cursor->data = msg->data; > cursor->sr_resid = 0; > + cursor->sr_resid_elen = 0; > > __ceph_msg_data_cursor_init(cursor); > } > diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c > index 4cb60bacf5f5..7425fa26e4c3 100644 > --- a/net/ceph/messenger_v1.c > +++ b/net/ceph/messenger_v1.c > @@ -160,7 +160,9 @@ static size_t sizeof_footer(struct ceph_connection *con) > static void prepare_message_data(struct ceph_msg *msg, u32 data_len) > { > /* Initialize data cursor if it's not a sparse read */ > - if (!msg->sparse_read) > + if (msg->sparse_read) > + msg->cursor.sr_total_resid = data_len; > + else > ceph_msg_data_cursor_init(&msg->cursor, msg, data_len); > } > > @@ -1032,18 +1034,25 @@ static int read_partial_sparse_msg_data(struct ceph_connection *con) > bool do_datacrc = !ceph_test_opt(from_msgr(con->msgr), NOCRC); > u32 crc = 0; > int ret = 1; > + int len; > > if (do_datacrc) > crc = con->in_data_crc; > > - do { > - if (con->v1.in_sr_kvec.iov_base) > + while (cursor->sr_total_resid && ret > 0) { > + len = 0; > + if (con->v1.in_sr_kvec.iov_base) { > ret = read_partial_message_chunk(con, > &con->v1.in_sr_kvec, > con->v1.in_sr_len, > &crc); > - else if (cursor->sr_resid > 0) > + if (ret == 1) > + len = con->v1.in_sr_len; > + } else if (cursor->sr_resid > 0) { > ret = read_partial_sparse_msg_extent(con, &crc); > + if (ret == 1) > + len = cursor->sr_resid_elen; > + } > > I was waiting for Jeff's review since this is his code, but it's been > a while so I'll just comment. > > To me, it's a sign of suboptimal structure that you needed to add new > fields to the cursor just to be able tell whether this function is done > with the message, because it's something that is already tracked but > gets lost between the loops here. > > Currently this function is structured as: > > do { > if (set up for kvec) > read as much as possible into kvec > else if (set up for pages) > read as much as possible into pages > > if (short read) > bail, will be called again later > > invoke con->ops->sparse_read() for processing the thing what > was read and setting up the next read OR setting up the initial > read > } until (con->ops->sparse_read() returns "done") > > If it was me, I would pursue refactoring this to: > > do { > if (set up for kvec) > read as much as possible into kvec > else if (set up for pages) > read as much as possible into pages > else > bail > > Why bail here ? For the new sparse read both the 'kvec' and 'pages' shouldn't be set, so the following '->sparse_read()' will set up them and continue. > > Or you just want the 'read_partial_sparse_msg_data()' to read data but not the first time to trigger the '->sparse_read()' ? Before I tried a similar approach and it was not as optional as this one as I do. Hi Xiubo, I addressed that in the previous message: ... and invoke con->ops->sparse_read() for the first time elsewhere, likely in prepare_message_data(). The rationale is that the state machine inside con->ops->sparse_read() is conceptually a cursor and prepare_message_data() is where the cursor is initialized for normal reads. The benefit is that no additional state would be needed. > The 'cursor->sr_total_resid', which is similar with 'cursor->total_resid', will just record the total data length for current sparse-read request and will determine whether should we skip parsing the "(page) data" in 'read_partial_message()'. I understand the intent behind cursor->sr_total_resid, but it would be nice to do without it because of its inherent redundancy. Did you try calling con->ops->sparse_read() for the first time exactly where cursor->sr_total_resid is initialized in your patch? Thanks, Ilya