Hi Ilya,
I just fixed it and it will be something like this below, I haven't
tested it yet because I am running other tests locally.
This time it will set the state to 'CEPH_SPARSE_READ_FINISH' when the
last sparse-read op is successfully read.
diff --git a/include/linux/ceph/osd_client.h
b/include/linux/ceph/osd_client.h
index 493de3496cd3..00d98e13100f 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -47,6 +47,7 @@ enum ceph_sparse_read_state {
CEPH_SPARSE_READ_DATA_LEN,
CEPH_SPARSE_READ_DATA_PRE,
CEPH_SPARSE_READ_DATA,
+ CEPH_SPARSE_READ_FINISH,
};
/*
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 848ef19055a0..b3b61f010428 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -5813,6 +5813,7 @@ static int prep_next_sparse_read(struct
ceph_connection *con,
/* reset for next sparse read request */
spin_unlock(&o->o_requests_lock);
+ sr->sr_state = CEPH_SPARSE_READ_FINISH;
o->o_sparse_op_idx = -1;
return 0;
found:
@@ -5918,8 +5919,6 @@ static int osd_sparse_read(struct ceph_connection
*con,
count);
return -EREMOTEIO;
}
-
- sr->sr_state = CEPH_SPARSE_READ_HDR;
goto next_op;
}
@@ -5952,6 +5951,8 @@ static int osd_sparse_read(struct ceph_connection
*con,
/* Bump the array index */
++sr->sr_index;
break;
+ case CEPH_SPARSE_READ_FINISH:
+ return 0;
}
return ret;
}
I will send out the V3 after my testing.
Jeff,
Could you help review it, want to make sure that it won't break anything
here for sparse read.
Thanks
- Xiubo
On 12/13/23 00:31, Ilya Dryomov wrote:
On Fri, Dec 8, 2023 at 5:08 PM <xiubli@xxxxxxxxxx> wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>
The messages from ceph maybe split into multiple socket packages
and we just need to wait for all the data to be availiable on the
sokcet.
This will add a new _FINISH state for the sparse-read to mark the
current sparse-read succeeded. Else it will treat it as a new
sparse-read when the socket receives the last piece of the osd
request reply message, and the cancel_request() later will help
init the sparse-read context.
URL: https://tracker.ceph.com/issues/63586
Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
include/linux/ceph/osd_client.h | 1 +
net/ceph/osd_client.c | 6 +++---
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
index 493de3496cd3..00d98e13100f 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -47,6 +47,7 @@ enum ceph_sparse_read_state {
CEPH_SPARSE_READ_DATA_LEN,
CEPH_SPARSE_READ_DATA_PRE,
CEPH_SPARSE_READ_DATA,
+ CEPH_SPARSE_READ_FINISH,
};
/*
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 848ef19055a0..f1705b4f19eb 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -5802,8 +5802,6 @@ static int prep_next_sparse_read(struct ceph_connection *con,
advance_cursor(cursor, sr->sr_req_len - end, false);
}
- ceph_init_sparse_read(sr);
-
/* find next op in this request (if any) */
while (++o->o_sparse_op_idx < req->r_num_ops) {
op = &req->r_ops[o->o_sparse_op_idx];
@@ -5919,7 +5917,7 @@ static int osd_sparse_read(struct ceph_connection *con,
return -EREMOTEIO;
}
- sr->sr_state = CEPH_SPARSE_READ_HDR;
+ sr->sr_state = CEPH_SPARSE_READ_FINISH;
goto next_op;
Hi Xiubo,
This code appears to be set up to handle multiple (sparse-read) ops in
a single OSD request. Wouldn't this break that case?
I think the bug is in read_sparse_msg_data(). It shouldn't be calling
con->ops->sparse_read() after the message has progressed to the footer.
osd_sparse_read() is most likely fine as is.
Notice how read_partial_msg_data() and read_partial_msg_data_bounce()
behave: if called at that point (i.e. after the data section has been
processed, meaning that cursor->total_resid == 0), they do nothing.
read_sparse_msg_data() should have a similar condition and basically
no-op itself.
While at it, let's rename it to read_partial_sparse_msg_data() to
emphasize the "partial"/no-op semantics that are required there.
Thanks,
Ilya