On 12/16/23 01:06, Jeff Layton wrote:
On Fri, 2023-12-15 at 08:20 +0800, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>
Once this happens that means there have bugs.
URL: https://tracker.ceph.com/issues/63586
Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
net/ceph/osd_client.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 5753036d1957..848ef19055a0 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -5912,10 +5912,12 @@ static int osd_sparse_read(struct ceph_connection *con,
fallthrough;
case CEPH_SPARSE_READ_DATA:
if (sr->sr_index >= count) {
- if (sr->sr_datalen && count)
+ if (sr->sr_datalen) {
pr_warn_ratelimited("sr_datalen %u sr_index %d count %u\n",
sr->sr_datalen, sr->sr_index,
count);
+ return -EREMOTEIO;
+ }
sr->sr_state = CEPH_SPARSE_READ_HDR;
goto next_op;
Do you really need to fail the read in this case? Would it not be better
to just advance past the extra junk? Or is this problem more indicative
of a malformed frame?
It'd be nice to have some specific explanation of the problem this is
fixing and how it was triggered. If this due to a misbehaving server?
Bad hw?
Hi Jeff,
Why I added this check before was that when I was debugging the
sparse-read issue last time I found even when the extent array 'count ==
0', sometimes the 'sr_datalen != 0'. I just thought the ceph side's
logic allowed it.
But this time from going through and debugging more in the ceph side
code, I didn't get where was doing this. So I just suspected it should
be an issue somewhere in kclient side and it also possibly would trigger
the sparse-read bug. So I just removed the 'count' check and finally
found the root case for both these two issues.
IMO normally this shouldn't happen anyway. Once this happens in kclient
side it will 100% corrupt the following msgs and reset the connection
from my tests. Actually when the 'count ==0' the 'sr_datalen' will be a
random number, sometimes very large, so just advancing past the extra
junk makes no any sense.
Thanks
- Xiubo