Re: [PATCH v3 3/3] libceph: just wait for more data to be available on the socket

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 16, 2024 at 5:09 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
>
>
> On 1/16/24 06:38, Ilya Dryomov wrote:
>
> On Fri, Dec 15, 2023 at 1:22 AM <xiubli@xxxxxxxxxx> wrote:
>
> From: Xiubo Li <xiubli@xxxxxxxxxx>
>
> The messages from ceph maybe split into multiple socket packages
> and we just need to wait for all the data to be availiable on the
> sokcet.
>
> This will add 'sr_total_resid' to record the total length for all
> data items for sparse-read message and 'sr_resid_elen' to record
> the current extent total length.
>
> URL: https://tracker.ceph.com/issues/63586
> Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
> ---
>  include/linux/ceph/messenger.h |  2 ++
>  net/ceph/messenger.c           |  1 +
>  net/ceph/messenger_v1.c        | 21 ++++++++++++++++-----
>  net/ceph/osd_client.c          |  1 +
>  4 files changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h
> index 2eaaabbe98cb..05e9b39a58f8 100644
> --- a/include/linux/ceph/messenger.h
> +++ b/include/linux/ceph/messenger.h
> @@ -231,10 +231,12 @@ struct ceph_msg_data {
>
>  struct ceph_msg_data_cursor {
>         size_t                  total_resid;    /* across all data items */
> +       size_t                  sr_total_resid; /* across all data items for sparse-read */
>
>         struct ceph_msg_data    *data;          /* current data item */
>         size_t                  resid;          /* bytes not yet consumed */
>         int                     sr_resid;       /* residual sparse_read len */
> +       int                     sr_resid_elen;  /* total sparse_read elen */
>
> Hi Xiubo,
>
> Is adding yet another sparse-read field to the cursor really needed?
> Would making read_partial_sparse_msg_extent() decrement sr_total_resid
> as it goes just like sr_resid is decremented work?
>
> This can be improved by removing it. Let me fix it.
>
>         bool                    need_crc;       /* crc update needed */
>         union {
>  #ifdef CONFIG_BLOCK
> diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
> index 3c8b78d9c4d1..eafd592af382 100644
> --- a/net/ceph/messenger.c
> +++ b/net/ceph/messenger.c
> @@ -1073,6 +1073,7 @@ void ceph_msg_data_cursor_init(struct ceph_msg_data_cursor *cursor,
>         cursor->total_resid = length;
>         cursor->data = msg->data;
>         cursor->sr_resid = 0;
> +       cursor->sr_resid_elen = 0;
>
>         __ceph_msg_data_cursor_init(cursor);
>  }
> diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c
> index 4cb60bacf5f5..7425fa26e4c3 100644
> --- a/net/ceph/messenger_v1.c
> +++ b/net/ceph/messenger_v1.c
> @@ -160,7 +160,9 @@ static size_t sizeof_footer(struct ceph_connection *con)
>  static void prepare_message_data(struct ceph_msg *msg, u32 data_len)
>  {
>         /* Initialize data cursor if it's not a sparse read */
> -       if (!msg->sparse_read)
> +       if (msg->sparse_read)
> +               msg->cursor.sr_total_resid = data_len;
> +       else
>                 ceph_msg_data_cursor_init(&msg->cursor, msg, data_len);
>  }
>
> @@ -1032,18 +1034,25 @@ static int read_partial_sparse_msg_data(struct ceph_connection *con)
>         bool do_datacrc = !ceph_test_opt(from_msgr(con->msgr), NOCRC);
>         u32 crc = 0;
>         int ret = 1;
> +       int len;
>
>         if (do_datacrc)
>                 crc = con->in_data_crc;
>
> -       do {
> -               if (con->v1.in_sr_kvec.iov_base)
> +       while (cursor->sr_total_resid && ret > 0) {
> +               len = 0;
> +               if (con->v1.in_sr_kvec.iov_base) {
>                         ret = read_partial_message_chunk(con,
>                                                          &con->v1.in_sr_kvec,
>                                                          con->v1.in_sr_len,
>                                                          &crc);
> -               else if (cursor->sr_resid > 0)
> +                       if (ret == 1)
> +                               len = con->v1.in_sr_len;
> +               } else if (cursor->sr_resid > 0) {
>                         ret = read_partial_sparse_msg_extent(con, &crc);
> +                       if (ret == 1)
> +                               len = cursor->sr_resid_elen;
> +               }
>
> I was waiting for Jeff's review since this is his code, but it's been
> a while so I'll just comment.
>
> To me, it's a sign of suboptimal structure that you needed to add new
> fields to the cursor just to be able tell whether this function is done
> with the message, because it's something that is already tracked but
> gets lost between the loops here.
>
> Currently this function is structured as:
>
>     do {
>         if (set up for kvec)
>            read as much as possible into kvec
>         else if (set up for pages)
>            read as much as possible into pages
>
>         if (short read)
>             bail, will be called again later
>
>          invoke con->ops->sparse_read() for processing the thing what
>          was read and setting up the next read OR setting up the initial
>          read
>     } until (con->ops->sparse_read() returns "done")
>
> If it was me, I would pursue refactoring this to:
>
>     do {
>         if (set up for kvec)
>            read as much as possible into kvec
>         else if (set up for pages)
>            read as much as possible into pages
>         else
>            bail
>
> Why bail here ? For the new sparse read both the 'kvec' and 'pages' shouldn't be set, so the following '->sparse_read()' will set up them and continue.
>
> Or you just want the 'read_partial_sparse_msg_data()' to read data but not the first time to trigger the '->sparse_read()' ? Before I tried a similar approach and it was not as optional as this one as I do.

Hi Xiubo,

I addressed that in the previous message:

    ... and invoke con->ops->sparse_read() for the first time elsewhere,
    likely in prepare_message_data().  The rationale is that the state
    machine inside con->ops->sparse_read() is conceptually a cursor and
    prepare_message_data() is where the cursor is initialized for normal
    reads.

The benefit is that no additional state would be needed.

> The 'cursor->sr_total_resid', which is similar with 'cursor->total_resid', will just record the total data length for current sparse-read request and will determine whether should we skip parsing the "(page) data" in 'read_partial_message()'.

I understand the intent behind cursor->sr_total_resid, but it would be
nice to do without it because of its inherent redundancy.

Did you try calling con->ops->sparse_read() for the first time exactly
where cursor->sr_total_resid is initialized in your patch?

Thanks,

                Ilya





[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux