On Thu, 2024-01-18 at 18:50 +0800, xiubli@xxxxxxxxxx wrote: > From: Xiubo Li <xiubli@xxxxxxxxxx> > > Once this happens that means there have bugs. > > URL: https://tracker.ceph.com/issues/63586 > Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx> > --- > net/ceph/osd_client.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c > index 9be80d01c1dc..f8029b30a3fb 100644 > --- a/net/ceph/osd_client.c > +++ b/net/ceph/osd_client.c > @@ -5912,6 +5912,13 @@ static int osd_sparse_read(struct ceph_connection *con, > fallthrough; > case CEPH_SPARSE_READ_DATA: > if (sr->sr_index >= count) { > + if (sr->sr_datalen) { > + pr_warn_ratelimited("sr_datalen %u sr_index %d count %u\n", > + sr->sr_datalen, sr->sr_index, > + count); > + return -EREMOTEIO; > + } > + Ok, so the server has (presumably) sent us a longer value for the sr_datalen than was in the extent map? Why should the sparse read engine care about that? It was (presumably) able to do its job of handling the read. Why not just advance past the extra junk and try to do another sparse read? Do we really need to fail the op for this? > sr->sr_state = CEPH_SPARSE_READ_HDR; > goto next_op; > } > @@ -5919,6 +5926,8 @@ static int osd_sparse_read(struct ceph_connection *con, > eoff = sr->sr_extent[sr->sr_index].off; > elen = sr->sr_extent[sr->sr_index].len; > > + sr->sr_datalen -= elen; > + > dout("[%d] ext %d off 0x%llx len 0x%llx\n", > o->o_osd, sr->sr_index, eoff, elen); > -- Jeff Layton <jlayton@xxxxxxxxxx>