On 3/29/22 7:28 PM, Jeff Layton wrote:
On Tue, 2022-03-29 at 19:12 +0800, Xiubo Li wrote:
On 3/29/22 5:53 PM, Luís Henriques wrote:
xiubli@xxxxxxxxxx writes:
From: Xiubo Li <xiubli@xxxxxxxxxx>
The type of 'num_fwd' in ceph 'MClientRequestForward' is 'int32_t',
while in 'ceph_mds_request_head' the type is '__u8'. So in case
the request bounces between MDSes exceeding 256 times, the client
will get stuck.
In this case it's ususally a bug in MDS and continue bouncing the
request makes no sense.
Ouch. Nice catch. This patch looks OK to me, just 2 minor comments
bellow.
Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
fs/ceph/mds_client.c | 31 ++++++++++++++++++++++++++++---
1 file changed, 28 insertions(+), 3 deletions(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index a89ee866ebbb..0bb6e7bc499c 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -3293,6 +3293,7 @@ static void handle_forward(struct ceph_mds_client *mdsc,
int err = -EINVAL;
void *p = msg->front.iov_base;
void *end = p + msg->front.iov_len;
+ bool aborted = false;
ceph_decode_need(&p, end, 2*sizeof(u32), bad);
next_mds = ceph_decode_32(&p);
@@ -3309,8 +3310,28 @@ static void handle_forward(struct ceph_mds_client *mdsc,
dout("forward tid %llu aborted, unregistering\n", tid);
__unregister_request(mdsc, req);
} else if (fwd_seq <= req->r_num_fwd) {
- dout("forward tid %llu to mds%d - old seq %d <= %d\n",
- tid, next_mds, req->r_num_fwd, fwd_seq);
+ /*
+ * The type of 'num_fwd' in ceph 'MClientRequestForward'
+ * is 'int32_t', while in 'ceph_mds_request_head' the
+ * type is '__u8'. So in case the request bounces between
+ * MDSes exceeding 256 times, the client will get stuck.
+ *
+ * In this case it's ususally a bug in MDS and continue
+ * bouncing the request makes no sense.
+ */
+ if (req->r_num_fwd == 256) {
+ mutex_lock(&req->r_fill_mutex);
+ req->r_err = -EIO;
Not sure -EIO is the most appropriate. Maybe -E2BIG... although not quite
it either.
Yeah, I also not very sure here.
Jeff ?
Matching errors like this really comes down to a judgement call. E2BIG
usually means that some buffer was sized too small, so you'll have users
trying to figure out what they passed in wrong if you return that here.
-EIO is the usual "default" when you don't know what else to use.
There's also -EREMOTEIO which may be closer here since this is
indicative of MDS problems. Given that, it may also be a good idea to
log a pr_warn or pr_notice message at the same time explaining what
happened.
As discussed in IRC, have switched to EMULTIHOP instead.
Thanks Jeff.
-- Xiubo
+ set_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags);
+ mutex_unlock(&req->r_fill_mutex);
+ aborted = true;
+ dout("forward tid %llu to mds%d - seq overflowed %d <= %d\n",
+ tid, next_mds, req->r_num_fwd, fwd_seq);
+ goto out;
This 'goto' statement can be dropped, but one before (when the
lookup_get_request() fails) needs to be adjusted, otherwise
ceph_mdsc_put_request() may be called with a NULL pointer.
Yeah, will fix it.
Thanks.
-- Xiubo
Cheers,