Re: [PATCH] ceph: check availability of mds cluster on mount after wait timeout

Xiubo Li <xiubli@xxxxxxxxxx> · Wed, 20 Nov 2019 15:46:18 +0800

On 2019/11/20 1:28, Jeff Layton wrote:
On Tue, 2019-11-19 at 08:04 -0500, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>

If all the MDS daemons are down for some reasons, and immediately
just before the kclient getting the new mdsmap the mount request is
fired out, it will be the request wait will timeout with -EIO.

After this just check the mds cluster availability to give a friendly
hint to let the users check the MDS cluster status.

Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
  fs/ceph/mds_client.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index a5163296d9d9..82a929084671 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2712,6 +2712,9 @@ static int ceph_mdsc_wait_request(struct ceph_mds_client *mdsc,
  	if (test_bit(CEPH_MDS_R_GOT_RESULT, &req->r_req_flags)) {
  		err = le32_to_cpu(req->r_reply_info.head->result);
  	} else if (err < 0) {
+		if (!ceph_mdsmap_is_cluster_available(mdsc->mdsmap))
+			pr_info("probably no mds server is up\n");
+
  		dout("aborted request %lld with %d\n", req->r_tid, err);
  
  		/*
Probably? If they're all unavailable then definitely.

Currently, the ceph_mdsmap_is_cluster_available() is a bit buggy, and my 
commit comment is not very correct and detail too.

In case:

# ceph fs dump
[...]

max_mds    3
in    0,1,2
up    {0=5139,1=4837,2=4985}
failed
damaged
stopped
data_pools    [2]
metadata_pool    1
inline_data    disabled
balancer
standby_count_wanted    1
[mds.a{0:5139} state up:active seq 7 laggy since 
2019-11-20T01:04:13.040701-0500 addr v1:192.168.195.165:6813/2514516359]
[mds.b{1:4837} state up:active seq 6 addr 
v1:192.168.195.165:6815/1921459709]
[mds.f{2:4985} state up:active seq 6 laggy since 
2019-11-20T01:04:13.040685-0500 addr v1:192.168.195.165:6814/3730607184]

The m->m_num_laggy == 2, but there still has one MDS in (up:active & 
!laggy) state. In this case if the mount request choose the mds.a, there 
still has the IO errors and failure. A better choice is that it can 
choose the mds.b instead. Currently the 
ceph_mdsmap_is_cluster_available() will just return false if there has 
any MDS is laggy. I will fix it.

But even after fixing it, in a corner case that the Monitor may take a 
while to update the laggy stat in mdsmap, at this time even though the 
mds.a and mds.f have already crashed, but the state is still in 
up:active without laggy, and if we do mount it may still choose the 
mds.a, then it will fail too. But that do not mean that the MDS cluster 
is not totally available. The "Probaly" here is in case of this corner case.

Will it make sense ?


  Also, this is a
pr_info message, so you probably need to prefix this with "ceph: ".

For the pr_info message it will add the module name as a prefix 
automatically:

"<6>[23167.778366] ceph: probably no mds server is up"

This should be enough.



Beyond that though, do we want to do this in what amounts to low-level
infrastructure for MDS requests?

I wonder if a warning like this would be better suited in
open_root_dentry(). If ceph_mdsc_do_request returns -EIO [1] maybe have
open_root_dentry do the check and pr_info?

Yeah, I was also thinking to bubble it up to the mount.ceph daemon in 
userland, but still not sure which errno should it be, just -ETIMEOUT or 
some others.


[1]: Why does it use -EIO here anyway? Wouldn't -ETIMEOUT or something
be better? Maybe the worry was that that error could bubble up to userla
nd?

Yeah, I also have the same doubt, this is also the general metadata IO 
paths for other operations, such as "create/lookup...".

And in the mount operation it really will bubble up to the mount.ceph in 
userland.

Thanks

BRs