Re: [PATCH] ceph: check availability of mds cluster on mount after wait timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019/11/20 1:28, Jeff Layton wrote:
On Tue, 2019-11-19 at 08:04 -0500, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>

If all the MDS daemons are down for some reasons, and immediately
just before the kclient getting the new mdsmap the mount request is
fired out, it will be the request wait will timeout with -EIO.

After this just check the mds cluster availability to give a friendly
hint to let the users check the MDS cluster status.

Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
  fs/ceph/mds_client.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index a5163296d9d9..82a929084671 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2712,6 +2712,9 @@ static int ceph_mdsc_wait_request(struct ceph_mds_client *mdsc,
  	if (test_bit(CEPH_MDS_R_GOT_RESULT, &req->r_req_flags)) {
  		err = le32_to_cpu(req->r_reply_info.head->result);
  	} else if (err < 0) {
+		if (!ceph_mdsmap_is_cluster_available(mdsc->mdsmap))
+			pr_info("probably no mds server is up\n");
+
  		dout("aborted request %lld with %d\n", req->r_tid, err);
/*
Probably? If they're all unavailable then definitely.

Currently, the ceph_mdsmap_is_cluster_available() is a bit buggy, and my commit comment is not very correct and detail too.

In case:

# ceph fs dump
[...]

max_mds    3
in    0,1,2
up    {0=5139,1=4837,2=4985}
failed
damaged
stopped
data_pools    [2]
metadata_pool    1
inline_data    disabled
balancer
standby_count_wanted    1
[mds.a{0:5139} state up:active seq 7 laggy since 2019-11-20T01:04:13.040701-0500 addr v1:192.168.195.165:6813/2514516359] [mds.b{1:4837} state up:active seq 6 addr v1:192.168.195.165:6815/1921459709] [mds.f{2:4985} state up:active seq 6 laggy since 2019-11-20T01:04:13.040685-0500 addr v1:192.168.195.165:6814/3730607184]

The m->m_num_laggy == 2, but there still has one MDS in (up:active & !laggy) state. In this case if the mount request choose the mds.a, there still has the IO errors and failure. A better choice is that it can choose the mds.b instead. Currently the ceph_mdsmap_is_cluster_available() will just return false if there has any MDS is laggy. I will fix it.

But even after fixing it, in a corner case that the Monitor may take a while to update the laggy stat in mdsmap, at this time even though the mds.a and mds.f have already crashed, but the state is still in up:active without laggy, and if we do mount it may still choose the mds.a, then it will fail too. But that do not mean that the MDS cluster is not totally available. The "Probaly" here is in case of this corner case.

Will it make sense ?


  Also, this is a
pr_info message, so you probably need to prefix this with "ceph: ".

For the pr_info message it will add the module name as a prefix automatically:

"<6>[23167.778366] ceph: probably no mds server is up"

This should be enough.



Beyond that though, do we want to do this in what amounts to low-level
infrastructure for MDS requests?

I wonder if a warning like this would be better suited in
open_root_dentry(). If ceph_mdsc_do_request returns -EIO [1] maybe have
open_root_dentry do the check and pr_info?

Yeah, I was also thinking to bubble it up to the mount.ceph daemon in userland, but still not sure which errno should it be, just -ETIMEOUT or some others.


[1]: Why does it use -EIO here anyway? Wouldn't -ETIMEOUT or something
be better? Maybe the worry was that that error could bubble up to userla
nd?

Yeah, I also have the same doubt, this is also the general metadata IO paths for other operations, such as "create/lookup...".

And in the mount operation it really will bubble up to the mount.ceph in userland.

Thanks

BRs







[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux