On 2019/11/20 1:28, Jeff Layton wrote:
On Tue, 2019-11-19 at 08:04 -0500, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>
If all the MDS daemons are down for some reasons, and immediately
just before the kclient getting the new mdsmap the mount request is
fired out, it will be the request wait will timeout with -EIO.
After this just check the mds cluster availability to give a friendly
hint to let the users check the MDS cluster status.
Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
fs/ceph/mds_client.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index a5163296d9d9..82a929084671 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2712,6 +2712,9 @@ static int ceph_mdsc_wait_request(struct ceph_mds_client *mdsc,
if (test_bit(CEPH_MDS_R_GOT_RESULT, &req->r_req_flags)) {
err = le32_to_cpu(req->r_reply_info.head->result);
} else if (err < 0) {
+ if (!ceph_mdsmap_is_cluster_available(mdsc->mdsmap))
+ pr_info("probably no mds server is up\n");
+
dout("aborted request %lld with %d\n", req->r_tid, err);
/*
Probably? If they're all unavailable then definitely.
Currently, the ceph_mdsmap_is_cluster_available() is a bit buggy, and my
commit comment is not very correct and detail too.
In case:
# ceph fs dump
[...]
max_mds 3
in 0,1,2
up {0=5139,1=4837,2=4985}
failed
damaged
stopped
data_pools [2]
metadata_pool 1
inline_data disabled
balancer
standby_count_wanted 1
[mds.a{0:5139} state up:active seq 7 laggy since
2019-11-20T01:04:13.040701-0500 addr v1:192.168.195.165:6813/2514516359]
[mds.b{1:4837} state up:active seq 6 addr
v1:192.168.195.165:6815/1921459709]
[mds.f{2:4985} state up:active seq 6 laggy since
2019-11-20T01:04:13.040685-0500 addr v1:192.168.195.165:6814/3730607184]
The m->m_num_laggy == 2, but there still has one MDS in (up:active &
!laggy) state. In this case if the mount request choose the mds.a, there
still has the IO errors and failure. A better choice is that it can
choose the mds.b instead. Currently the
ceph_mdsmap_is_cluster_available() will just return false if there has
any MDS is laggy. I will fix it.
But even after fixing it, in a corner case that the Monitor may take a
while to update the laggy stat in mdsmap, at this time even though the
mds.a and mds.f have already crashed, but the state is still in
up:active without laggy, and if we do mount it may still choose the
mds.a, then it will fail too. But that do not mean that the MDS cluster
is not totally available. The "Probaly" here is in case of this corner case.
Will it make sense ?
Also, this is a
pr_info message, so you probably need to prefix this with "ceph: ".
For the pr_info message it will add the module name as a prefix
automatically:
"<6>[23167.778366] ceph: probably no mds server is up"
This should be enough.
Beyond that though, do we want to do this in what amounts to low-level
infrastructure for MDS requests?
I wonder if a warning like this would be better suited in
open_root_dentry(). If ceph_mdsc_do_request returns -EIO [1] maybe have
open_root_dentry do the check and pr_info?
Yeah, I was also thinking to bubble it up to the mount.ceph daemon in
userland, but still not sure which errno should it be, just -ETIMEOUT or
some others.
[1]: Why does it use -EIO here anyway? Wouldn't -ETIMEOUT or something
be better? Maybe the worry was that that error could bubble up to userla
nd?
Yeah, I also have the same doubt, this is also the general metadata IO
paths for other operations, such as "create/lookup...".
And in the mount operation it really will bubble up to the mount.ceph in
userland.
Thanks
BRs