Re: [PATCH v4 1/3] libceph: fail the sparse-read if there still has data in socket

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 1/18/24 22:03, Jeff Layton wrote:
On Thu, 2024-01-18 at 18:50 +0800, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>

Once this happens that means there have bugs.

URL: https://tracker.ceph.com/issues/63586
Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
  net/ceph/osd_client.c | 9 +++++++++
  1 file changed, 9 insertions(+)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 9be80d01c1dc..f8029b30a3fb 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -5912,6 +5912,13 @@ static int osd_sparse_read(struct ceph_connection *con,
  		fallthrough;
  	case CEPH_SPARSE_READ_DATA:
  		if (sr->sr_index >= count) {
+			if (sr->sr_datalen) {
+				pr_warn_ratelimited("sr_datalen %u sr_index %d count %u\n",
+						    sr->sr_datalen, sr->sr_index,
+						    count);
+				return -EREMOTEIO;
+			}
+
Ok, so the server has (presumably) sent us a longer value for the
sr_datalen than was in the extent map?

Why should the sparse read engine care about that? It was (presumably)
able to do its job of handling the read. Why not just advance past the
extra junk and try to do another sparse read? Do we really need to fail
the op for this?

Hi Jeff,

I saw the problem just when I first debugging the sparse-read bug, and the length will be very large, more detail and the logs please see the tracker https://tracker.ceph.com/issues/63586:

     11741055 <4>[180940.606488] libceph: sr_datalen 251723776 sr_index 0 count 0

In this case the request could cause the same request being retrying infinitely. While just in other case the when the ceph send a incorrect data length, how should we do ? Should we retry it ? How could we skip it if the length is so large ?

Thanks

- Xiubo



  			sr->sr_state = CEPH_SPARSE_READ_HDR;
  			goto next_op;
  		}
@@ -5919,6 +5926,8 @@ static int osd_sparse_read(struct ceph_connection *con,
  		eoff = sr->sr_extent[sr->sr_index].off;
  		elen = sr->sr_extent[sr->sr_index].len;
+ sr->sr_datalen -= elen;
+
  		dout("[%d] ext %d off 0x%llx len 0x%llx\n",
  		     o->o_osd, sr->sr_index, eoff, elen);





[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux