Re: [PATCH v6 3/3] libceph: just wait for more data to be available on the socket

Xiubo Li <xiubli@xxxxxxxxxx> · Wed, 28 Feb 2024 08:45:41 +0800

On 2/28/24 01:22, Luis Henriques wrote:
Hi Xiubo!

xiubli@xxxxxxxxxx writes:

From: Xiubo Li <xiubli@xxxxxxxxxx>

A short read may occur while reading the message footer from the
socket.  Later, when the socket is ready for another read, the
messenger shoudl invoke all read_partial* handlers, except the
read_partial_sparse_msg_data().  The contract between the messenger
and these handlers is that the handlers should bail if the area
of the message is responsible for is already processed.  So,
in this case, it's expected that read_sparse_msg_data() would bail,
allowing the messenger to invoke read_partial() for the footer and
pick up where it left off.

However read_partial_sparse_msg_data() violates that contract and
ends up calling into the state machine in the OSD client. The
sparse-read state machine just assumes that it's a new op and
interprets some piece of the footer as the sparse-read extents/data
and then returns bogus extents/data length, etc.

This will just reuse the 'total_resid' to determine whether should
the read_partial_sparse_msg_data() bail out or not. Because once
it reaches to zero that means all the extents and data have been
successfully received in last read, else it could break out when
partially reading any of the extents and data. And then the
osd_sparse_read() could continue where it left off.
I'm seeing an issue with fstest generic/580, which seems to enter an
infinite loop effectively rendering the testing VM unusable.  It's pretty
easy to reproduce, just run the test ensuring to be using msgv2 (I'm
mounting the filesystem with 'ms_mode=crc'), and you should see the
following on the logs:

[...]
   libceph: prepare_sparse_read_cont: ret 0x1000 total_resid 0x0 resid 0x0
   libceph: osd1 (2)192.168.155.1:6810 read processing error
   libceph: mon0 (2)192.168.155.1:40608 session established
   libceph: bad late_status 0x1
   libceph: osd1 (2)192.168.155.1:6810 protocol error, bad epilogue
   libceph: mon0 (2)192.168.155.1:40608 session established
   libceph: prepare_sparse_read_cont: ret 0x1000 total_resid 0x0 resid 0x0
   libceph: osd1 (2)192.168.155.1:6810 read processing error
   libceph: mon0 (2)192.168.155.1:40608 session established
   libceph: bad late_status 0x1
[...]

Reverting this patch (commit 8e46a2d068c9 ("libceph: just wait for more
data to be available on the socket")) seems to fix.  I haven't
investigated it further, but since it'll take me some time to refresh my
memory, I thought I should report it immediately.  Maybe someone has any
idea.

The following patch should fix it. I haven't test it yet. Will do it 
later today:

diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
index a0ca5414b333..2c32ad4d9774 100644
--- a/net/ceph/messenger_v2.c
+++ b/net/ceph/messenger_v2.c
@@ -1860,10 +1860,10 @@ static int prepare_read_control_remainder(struct 
ceph_connection *con)
 static int prepare_read_data(struct ceph_connection *con)
 {
        struct bio_vec bv;
+       u64 len = con->in_msg->sparse_read_total ? : data_len(con->in_msg);

        con->in_data_crc = -1;
-       ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg,
-                                 data_len(con->in_msg));
+       ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg, len);

        get_bvec_at(&con->v2.in_cursor, &bv);
        if (ceph_test_opt(from_msgr(con->msgr), RXBOUNCE)) {


Cheers,