Re: [PATCH v4 4/5] ceph: record updated mon_addr on remount

Jeff Layton <jlayton@xxxxxxxxxx> · Wed, 14 Jul 2021 14:15:50 -0400

On Wed, 2021-07-14 at 17:35 +0100, Luis Henriques wrote:
> On Wed, Jul 14, 2021 at 12:17:33PM -0400, Jeff Layton wrote:
> > On Wed, 2021-07-14 at 15:35 +0530, Venky Shankar wrote:
> > > Note that the new monitors are just shown in /proc/mounts.
> > > Ceph does not (re)connect to new monitors yet.
> > > 
> > > Signed-off-by: Venky Shankar <vshankar@xxxxxxxxxx>
> > > ---
> > >  fs/ceph/super.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> > > index d8c6168b7fcd..d3a5a3729c5b 100644
> > > --- a/fs/ceph/super.c
> > > +++ b/fs/ceph/super.c
> > > @@ -1268,6 +1268,13 @@ static int ceph_reconfigure_fc(struct fs_context *fc)
> > >  	else
> > >  		ceph_clear_mount_opt(fsc, ASYNC_DIROPS);
> > >  
> > > +	if (strcmp(fsc->mount_options->mon_addr, fsopt->mon_addr)) {
> > > +		kfree(fsc->mount_options->mon_addr);
> > > +		fsc->mount_options->mon_addr = fsopt->mon_addr;
> > > +		fsopt->mon_addr = NULL;
> > > +		printk(KERN_NOTICE "ceph: monitor addresses recorded, but not used for reconnection");
> > 
> > It's currently more in-vogue to use pr_notice() for this. I'll plan to
> > make that (minor) change before I merge. No need to resend.
> 
> Yeah, this was the only comment I had too.  I saw some issues in the
> previous revision but the changes to ceph_parse_source() seem to fix it in
> this revision.
> 
> The other annoying thing I found isn't related with this patchset but with
> a change that's been done some time ago by Xiubo (added to CC): it looks
> like that if we have an invalid parameter (for example, wrong secret)
> we'll always get -EHOSTUNREACH.
> 
> See below a possible fix (although I'm not entirely sure that's the correct
> one).
> 
> Cheers,
> --
> Luís
> 
> From a988d24d8e72fc4933459f3dd5d303cbc9a566ed Mon Sep 17 00:00:00 2001
> From: Luis Henriques <lhenriques@xxxxxxx>
> Date: Wed, 14 Jul 2021 16:56:36 +0100
> Subject: [PATCH] ceph: don't hide error code if we don't have mdsmap
> 
> Since commit 97820058fb28 ("ceph: check availability of mds cluster on mount
> after wait timeout") we're returning -EHOSTUNREACH, even if the error isn't
> related with the MDSs availability.  For example, we'll get it even if we're
> trying to mounting a filesystem with an invalid username or secret.
> 
> Only return this error if we get -EIO.
> 
> Fixes: 97820058fb28 ("ceph: check availability of mds cluster on mount after wait timeout")
> Signed-off-by: Luis Henriques <lhenriques@xxxxxxx>
> ---
>  fs/ceph/super.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> index 086a1ceec9d8..67d70059ce9f 100644
> --- a/fs/ceph/super.c
> +++ b/fs/ceph/super.c
> @@ -1230,7 +1230,8 @@ static int ceph_get_tree(struct fs_context *fc)
>  	return 0;
>  
>  out_splat:
> -	if (!ceph_mdsmap_is_cluster_available(fsc->mdsc->mdsmap)) {
> +	if ((err == -EIO) &&
> +	    !ceph_mdsmap_is_cluster_available(fsc->mdsc->mdsmap)) {
>  		pr_info("No mds server is up or the cluster is laggy\n");
>  		err = -EHOSTUNREACH;
>  	}

Yeah, I've noticed that message pop up under all sorts of circumstances
and it is an annoyance. I'm happy to consider such a patch if you send
it separately.

That said, I'm honestly not sure this message is really helpful, and
overriding errors like this at a high level seems sort of sketchy. Maybe
we should just drop that message, or figure out a way to limit it to
_just_ that situation.

--
Jeff Layton <jlayton@xxxxxxxxxx>