Re: [PATCH] ceph: stop forwarding the request when exceeding 256 times

Xiubo Li <xiubli@xxxxxxxxxx> · Tue, 29 Mar 2022 20:24:02 +0800

On 3/29/22 7:28 PM, Jeff Layton wrote:
On Tue, 2022-03-29 at 19:12 +0800, Xiubo Li wrote:
On 3/29/22 5:53 PM, Luís Henriques wrote:
xiubli@xxxxxxxxxx writes:

From: Xiubo Li <xiubli@xxxxxxxxxx>

The type of 'num_fwd' in ceph 'MClientRequestForward' is 'int32_t',
while in 'ceph_mds_request_head' the type is '__u8'. So in case
the request bounces between MDSes exceeding 256 times, the client
will get stuck.

In this case it's ususally a bug in MDS and continue bouncing the
request makes no sense.
Ouch.  Nice catch.  This patch looks OK to me, just 2 minor comments
bellow.

Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
   fs/ceph/mds_client.c | 31 ++++++++++++++++++++++++++++---
   1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index a89ee866ebbb..0bb6e7bc499c 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -3293,6 +3293,7 @@ static void handle_forward(struct ceph_mds_client *mdsc,
   	int err = -EINVAL;
   	void *p = msg->front.iov_base;
   	void *end = p + msg->front.iov_len;
+	bool aborted = false;
   
   	ceph_decode_need(&p, end, 2*sizeof(u32), bad);
   	next_mds = ceph_decode_32(&p);
@@ -3309,8 +3310,28 @@ static void handle_forward(struct ceph_mds_client *mdsc,
   		dout("forward tid %llu aborted, unregistering\n", tid);
   		__unregister_request(mdsc, req);
   	} else if (fwd_seq <= req->r_num_fwd) {
-		dout("forward tid %llu to mds%d - old seq %d <= %d\n",
-		     tid, next_mds, req->r_num_fwd, fwd_seq);
+		/*
+		 * The type of 'num_fwd' in ceph 'MClientRequestForward'
+		 * is 'int32_t', while in 'ceph_mds_request_head' the
+		 * type is '__u8'. So in case the request bounces between
+		 * MDSes exceeding 256 times, the client will get stuck.
+		 *
+		 * In this case it's ususally a bug in MDS and continue
+		 * bouncing the request makes no sense.
+		 */
+		if (req->r_num_fwd == 256) {
+			mutex_lock(&req->r_fill_mutex);
+			req->r_err = -EIO;
Not sure -EIO is the most appropriate.  Maybe -E2BIG... although not quite
it either.

Yeah, I also not very sure here.

Jeff ?

Matching errors like this really comes down to a judgement call.  E2BIG
usually means that some buffer was sized too small, so you'll have users
trying to figure out what they passed in wrong if you return that here.

-EIO is the usual "default" when you don't know what else to use.
There's also -EREMOTEIO which may be closer here since this is
indicative of MDS problems. Given that, it may also be a good idea to
log a pr_warn or pr_notice message at the same time explaining what
happened.

As discussed in IRC, have switched to EMULTIHOP instead.

Thanks Jeff.

-- Xiubo


+			set_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags);
+			mutex_unlock(&req->r_fill_mutex);
+			aborted = true;
+			dout("forward tid %llu to mds%d - seq overflowed %d <= %d\n",
+			     tid, next_mds, req->r_num_fwd, fwd_seq);
+			goto out;
This 'goto' statement can be dropped, but one before (when the
lookup_get_request() fails) needs to be adjusted, otherwise
ceph_mdsc_put_request() may be called with a NULL pointer.
Yeah, will fix it.

Thanks.

-- Xiubo


Cheers,