On Tue, 2019-11-19 at 08:04 -0500, xiubli@xxxxxxxxxx wrote: > From: Xiubo Li <xiubli@xxxxxxxxxx> > > If all the MDS daemons are down for some reasons, and immediately > just before the kclient getting the new mdsmap the mount request is > fired out, it will be the request wait will timeout with -EIO. > > After this just check the mds cluster availability to give a friendly > hint to let the users check the MDS cluster status. > > Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx> > --- > fs/ceph/mds_client.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > index a5163296d9d9..82a929084671 100644 > --- a/fs/ceph/mds_client.c > +++ b/fs/ceph/mds_client.c > @@ -2712,6 +2712,9 @@ static int ceph_mdsc_wait_request(struct ceph_mds_client *mdsc, > if (test_bit(CEPH_MDS_R_GOT_RESULT, &req->r_req_flags)) { > err = le32_to_cpu(req->r_reply_info.head->result); > } else if (err < 0) { > + if (!ceph_mdsmap_is_cluster_available(mdsc->mdsmap)) > + pr_info("probably no mds server is up\n"); > + > dout("aborted request %lld with %d\n", req->r_tid, err); > > /* Probably? If they're all unavailable then definitely. Also, this is a pr_info message, so you probably need to prefix this with "ceph: ". Beyond that though, do we want to do this in what amounts to low-level infrastructure for MDS requests? I wonder if a warning like this would be better suited in open_root_dentry(). If ceph_mdsc_do_request returns -EIO [1] maybe have open_root_dentry do the check and pr_info? [1]: Why does it use -EIO here anyway? Wouldn't -ETIMEOUT or something be better? Maybe the worry was that that error could bubble up to userla nd? -- Jeff Layton <jlayton@xxxxxxxxxx>