Re: [PATCH v2 2/2] libceph: just wait for more data to be available on the socket

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2023-12-13 at 14:07 +0100, Ilya Dryomov wrote:
> On Wed, Dec 13, 2023 at 1:05 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
> > 
> > 
> > On 12/13/23 19:54, Ilya Dryomov wrote:
> > 
> > On Wed, Dec 13, 2023 at 12:03 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
> > 
> > On 12/13/23 18:31, Ilya Dryomov wrote:
> > 
> > On Wed, Dec 13, 2023 at 2:02 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
> > 
> > On 12/13/23 00:31, Ilya Dryomov wrote:
> > 
> > On Fri, Dec 8, 2023 at 5:08 PM <xiubli@xxxxxxxxxx> wrote:
> > 
> > From: Xiubo Li <xiubli@xxxxxxxxxx>
> > 
> > The messages from ceph maybe split into multiple socket packages
> > and we just need to wait for all the data to be availiable on the
> > sokcet.
> > 
> > This will add a new _FINISH state for the sparse-read to mark the
> > current sparse-read succeeded. Else it will treat it as a new
> > sparse-read when the socket receives the last piece of the osd
> > request reply message, and the cancel_request() later will help
> > init the sparse-read context.
> > 
> > URL: https://tracker.ceph.com/issues/63586
> > Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
> > ---
> >    include/linux/ceph/osd_client.h | 1 +
> >    net/ceph/osd_client.c           | 6 +++---
> >    2 files changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
> > index 493de3496cd3..00d98e13100f 100644
> > --- a/include/linux/ceph/osd_client.h
> > +++ b/include/linux/ceph/osd_client.h
> > @@ -47,6 +47,7 @@ enum ceph_sparse_read_state {
> >           CEPH_SPARSE_READ_DATA_LEN,
> >           CEPH_SPARSE_READ_DATA_PRE,
> >           CEPH_SPARSE_READ_DATA,
> > +       CEPH_SPARSE_READ_FINISH,
> >    };
> > 
> >    /*
> > diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> > index 848ef19055a0..f1705b4f19eb 100644
> > --- a/net/ceph/osd_client.c
> > +++ b/net/ceph/osd_client.c
> > @@ -5802,8 +5802,6 @@ static int prep_next_sparse_read(struct ceph_connection *con,
> >                           advance_cursor(cursor, sr->sr_req_len - end, false);
> >           }
> > 
> > -       ceph_init_sparse_read(sr);
> > -
> >           /* find next op in this request (if any) */
> >           while (++o->o_sparse_op_idx < req->r_num_ops) {
> >                   op = &req->r_ops[o->o_sparse_op_idx];
> > @@ -5919,7 +5917,7 @@ static int osd_sparse_read(struct ceph_connection *con,
> >                                   return -EREMOTEIO;
> >                           }
> > 
> > -                       sr->sr_state = CEPH_SPARSE_READ_HDR;
> > +                       sr->sr_state = CEPH_SPARSE_READ_FINISH;
> >                           goto next_op;
> > 
> > Hi Xiubo,
> > 
> > This code appears to be set up to handle multiple (sparse-read) ops in
> > a single OSD request.  Wouldn't this break that case?
> > 
> > Yeah, it will break it. I just missed it.
> > 
> > I think the bug is in read_sparse_msg_data().  It shouldn't be calling
> > con->ops->sparse_read() after the message has progressed to the footer.
> > osd_sparse_read() is most likely fine as is.
> > 
> > Yes it is. We cannot tell exactly whether has it progressed to the
> > footer IMO, such as when in case 'con->v1.in_base_pos ==
> > 
> > Hi Xiubo,
> > 
> > I don't buy this.  If the messenger is trying to read the footer, it
> > _has_ progressed to the footer.  This is definitive and irreversible,
> > not a "maybe".
> > 
> > sizeof(con->v1.in_hdr)' the socket buffer may break just after finishing
> > sparse-read and before reading footer or some where in sparse-read. For
> > the later case then we should continue in the sparse reads.
> > 
> > The size of the data section of the message is always known.  The
> > contract is that read_partial_msg_data*() returns 1 and does nothing
> > else after the data section is consumed.  This is how the messenger is
> > told to move on to the footer.
> > 
> > read_partial_sparse_msg_data() doesn't adhere to this contract and
> > should be fixed.
> > 
> > Notice how read_partial_msg_data() and read_partial_msg_data_bounce()
> > behave: if called at that point (i.e. after the data section has been
> > processed, meaning that cursor->total_resid == 0), they do nothing.
> > read_sparse_msg_data() should have a similar condition and basically
> > no-op itself.
> > 
> > Correct, this what I want to do in the sparse-read code.
> > 
> > No, this needs to be done on the messenger side.  sparse-read code
> > should not be invoked after the messenger has moved on to the footer.
> > 
> >  From my reading, even the messenger has moved on to the 'footer', it
> > will always try to invoke to parse the 'header':
> > 
> > read_partial(con, end, size, &con->v1.in_hdr);
> > 
> > But it will do nothing and just returns.
> > 
> > And the same for 'front', 'middle' and '(page) data', then the last for
> > 'footer'.
> > 
> > This is correct.
> > 
> > Did I miss something ?
> > 
> > No, it's how the messenger is set up to work.  The problem is that
> > read_sparse_msg_data() doesn't fit this model, so it should be fixed
> > and renamed to read_partial_sparse_msg_data().
> > 
> > Okay, let me try.
> > 
> > Did you see my new patch in last mail ? Will that work ?
> > 
> > If not I will try to fix it in read_partial_sparse_msg_data().
> 
> It might work around the problem, but it's not the right fix.  Think
> about it: what business does code in the OSD client have being called
> when the messenger is 14 bytes into reading the footer (number taken
> from the log in the cover letter)?  All data is read at that point and
> the last op in a multi-op OSD request may not even be sparse-read...
> 

The assumption in ceph_osdc_new_request is that if you have a multi-op
request, that they are either all reads or all writes. The assumption
about writes has been there a long time. I simply added the ability to
do the same for reads:

    4e8c4c235578 libceph: allow ceph_osdc_new_request to accept a multi-op read

Note that we do this in ceph_sync_write in the rmw case, so that does
need to keep working.

Technically we could have a normal read in the same request as a sparse
read but I figured that would be a little silly. That's why it tries to
prep a second sparse read once the first is done. If there isn't one
then that should fall through to the return 0 just before the "found:"
label in prep_next_sparse_read.

So, I tend to agree with Ilya that the problem isn't down in the OSD
state machine, but more likely at the receive handling layer. It's
certainly plausible that I didn't get the handling of short receives
right (particularly in the messenger_v1 part).
-- 
Jeff Layton <jlayton@xxxxxxxxxx>





[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux