Re: [PATCH] ceph: check availability of mds cluster on mount after wait timeout

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 19 Nov 2019 12:28:55 -0500

On Tue, 2019-11-19 at 08:04 -0500, xiubli@xxxxxxxxxx wrote:
> From: Xiubo Li <xiubli@xxxxxxxxxx>
> 
> If all the MDS daemons are down for some reasons, and immediately
> just before the kclient getting the new mdsmap the mount request is
> fired out, it will be the request wait will timeout with -EIO.
> 
> After this just check the mds cluster availability to give a friendly
> hint to let the users check the MDS cluster status.
> 
> Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
> ---
>  fs/ceph/mds_client.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index a5163296d9d9..82a929084671 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -2712,6 +2712,9 @@ static int ceph_mdsc_wait_request(struct ceph_mds_client *mdsc,
>  	if (test_bit(CEPH_MDS_R_GOT_RESULT, &req->r_req_flags)) {
>  		err = le32_to_cpu(req->r_reply_info.head->result);
>  	} else if (err < 0) {
> +		if (!ceph_mdsmap_is_cluster_available(mdsc->mdsmap))
> +			pr_info("probably no mds server is up\n");
> +
>  		dout("aborted request %lld with %d\n", req->r_tid, err);
>  
>  		/*

Probably? If they're all unavailable then definitely. Also, this is a
pr_info message, so you probably need to prefix this with "ceph: ".

Beyond that though, do we want to do this in what amounts to low-level
infrastructure for MDS requests?

I wonder if a warning like this would be better suited in
open_root_dentry(). If ceph_mdsc_do_request returns -EIO [1] maybe have
open_root_dentry do the check and pr_info?

[1]: Why does it use -EIO here anyway? Wouldn't -ETIMEOUT or something
be better? Maybe the worry was that that error could bubble up to userla
nd?

-- 
Jeff Layton <jlayton@xxxxxxxxxx>