Re: [PATCH v3] ceph: check availability of mds cluster on mount after wait timeout

Jeff Layton <jlayton@xxxxxxxxxx> · Wed, 11 Dec 2019 08:17:03 -0500

On Tue, 2019-12-10 at 20:29 -0500, xiubli@xxxxxxxxxx wrote:
> From: Xiubo Li <xiubli@xxxxxxxxxx>
> 
> If all the MDS daemons are down for some reasons and for the first
> time to do the mount, it will fail with IO error after the mount
> request timed out.
> 
> Or if the cluster becomes laggy suddenly, and just before the kclient
> getting the new mdsmap and the mount request is fired off, it also
> will fail with IO error.
> 
> This will add some useful hint message by checking the cluster state
> before the fail the mount operation.
> 
> Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
> ---
> 
> V3:
> - Rebase to the new mount API version.
> 
>  fs/ceph/mds_client.c | 3 +--
>  fs/ceph/super.c      | 5 +++++
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 7d3ec051f179..bf507120659e 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -2576,8 +2576,7 @@ static void __do_request(struct ceph_mds_client *mdsc,
>  		if (!(mdsc->fsc->mount_options->flags &
>  		      CEPH_MOUNT_OPT_MOUNTWAIT) &&
>  		    !ceph_mdsmap_is_cluster_available(mdsc->mdsmap)) {
> -			err = -ENOENT;
> -			pr_info("probably no mds server is up\n");
> +			err = -EHOSTUNREACH;
>  			goto finish;
>  		}
>  	}
> diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> index 9c9a7c68eea3..6f33a265ccf1 100644
> --- a/fs/ceph/super.c
> +++ b/fs/ceph/super.c
> @@ -1068,6 +1068,11 @@ static int ceph_get_tree(struct fs_context *fc)
>  	return 0;
>  
>  out_splat:
> +	if (!ceph_mdsmap_is_cluster_available(fsc->mdsc->mdsmap)) {
> +		pr_info("No mds server is up or the cluster is laggy\n");
> +		err = -EHOSTUNREACH;
> +	}
> +
>  	ceph_mdsc_close_sessions(fsc->mdsc);
>  	deactivate_locked_super(sb);
>  	goto out_final;

Looks reasonable. Merged into testing branch with a revamped changelog.
Please have a look at the testing branch and make sure the changelog is
OK with you.

Thanks,
-- 
Jeff Layton <jlayton@xxxxxxxxxx>