On Wed, Jan 17, 2024 at 2:26 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > > On 1/16/24 18:00, Ilya Dryomov wrote: > > On Tue, Jan 16, 2024 at 5:09 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > [...] > I was waiting for Jeff's review since this is his code, but it's been > a while so I'll just comment. > > To me, it's a sign of suboptimal structure that you needed to add new > fields to the cursor just to be able tell whether this function is done > with the message, because it's something that is already tracked but > gets lost between the loops here. > > Currently this function is structured as: > > do { > if (set up for kvec) > read as much as possible into kvec > else if (set up for pages) > read as much as possible into pages > > if (short read) > bail, will be called again later > > invoke con->ops->sparse_read() for processing the thing what > was read and setting up the next read OR setting up the initial > read > } until (con->ops->sparse_read() returns "done") > > If it was me, I would pursue refactoring this to: > > do { > if (set up for kvec) > read as much as possible into kvec > else if (set up for pages) > read as much as possible into pages > else > bail > > Why bail here ? For the new sparse read both the 'kvec' and 'pages' shouldn't be set, so the following '->sparse_read()' will set up them and continue. > > Or you just want the 'read_partial_sparse_msg_data()' to read data but not the first time to trigger the '->sparse_read()' ? Before I tried a similar approach and it was not as optional as this one as I do. > > Hi Xiubo, > > I addressed that in the previous message: > > ... and invoke con->ops->sparse_read() for the first time elsewhere, > likely in prepare_message_data(). The rationale is that the state > machine inside con->ops->sparse_read() is conceptually a cursor and > prepare_message_data() is where the cursor is initialized for normal > reads. > > The benefit is that no additional state would be needed. > > Hi Ilya, > > I am afraid this won't work if my understanding is correct. > > I think you want to call the first 'con->ops->sparse_read()' earlier somewhere such as in 'prepare_message_data()' to parse and skip the extra spare-read extent array info and then we could reuse the 'cursor->total_resid' as others, right ? > > As we know the 'cursor->total_resid' is for pure data only, while for sparse-read it introduce some extra extent array info. > > For sparse-read we need to call 'con->ops->sparse_read()' several times for each sparse-read OP and each time it will only parse part of the extra info. That means we need to call 'con->ops->sparse_read()' many times in 'prepare_message_data()' in this approach. > > The the most important thing is that we couldn't do this in 'prepare_message_data()', because the 'prepare_message_data()' will be called just before parsing the "front" and "middle" of the msg. Hi Xiubo, I see, I missed that in my suggestion. > > Else if we did this in somewhere place out of 'prepare_message_data()', then we must do it in the following code just before Line#1267: > > 1262 /* (page) data */ 1263 if (data_len) { 1264 if (!m->num_data_items) 1265 return -EIO; 1266 1267 if (m->sparse_read) 1268 ret = read_partial_sparse_msg_data(con); 1269 else if (ceph_test_opt(from_msgr(con->msgr), RXBOUNCE)) 1270 ret = read_partial_msg_data_bounce(con); 1271 else 1272 ret = read_partial_msg_data(con); 1273 if (ret <= 0) 1274 return ret; 1275 } > > But this still couldn't resolve the multiple sparse-read OPs case, once the first OP parsing finishes we couldn't jump back to call 'prepare_message_data()'. > > The sparse-read data in the socket buffer will be: > > OP1: <sparse-read header> <sparse-read extents> <sparse-read real data length> <real data 1>; OP2: <sparse-read header> <sparse-read extents> <sparse-read real data length> <real data 2>; OP3:.... > > Currently the 'cursor->total_resid' will record the total length of <real data 1> + <real data 2> + ... > > And the 'cursor->sr_total_resid' will record all the above. > > The 'cursor->sr_total_resid', which is similar with 'cursor->total_resid', will just record the total data length for current sparse-read request and will determine whether should we skip parsing the "(page) data" in 'read_partial_message()'. > > I understand the intent behind cursor->sr_total_resid, but it would be > nice to do without it because of its inherent redundancy. > > The above is why I added ''cursor->sr_total_resid'. > > Did you try calling con->ops->sparse_read() for the first time exactly > where cursor->sr_total_resid is initialized in your patch? > > Yeah, I did but it didn't work as I mentioned above. If I am wrong, please correct me. I think you are right, and also in that introducing a separate field makes for an easier fix. I continue to claim that a separate field is _inherently_ redundant though -- it's just that due to the structure of the current code, it's not obvious. There is definitely a way to make this (to be precise: allow read_partial_sparse_msg_data() to be invoked by the messenger on short reads of the footer without causing any damage like attempting to parse random data as sparse-read header and/or extents) happen without adding additional fields to the cursor or elsewhere. But it might involve refactoring the entire state machine such that information isn't lost between the messenger and the OSD client, so I'm not asking that you take that on here. I wouldn't object to cursor->sr_total_resid being added, I just don't like it ;) Thanks, Ilya