Re: [PATCH v4 4/5] ceph: record updated mon_addr on remount

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 14, 2021 at 02:15:50PM -0400, Jeff Layton wrote:
> On Wed, 2021-07-14 at 17:35 +0100, Luis Henriques wrote:
> > On Wed, Jul 14, 2021 at 12:17:33PM -0400, Jeff Layton wrote:
> > > On Wed, 2021-07-14 at 15:35 +0530, Venky Shankar wrote:
> > > > Note that the new monitors are just shown in /proc/mounts.
> > > > Ceph does not (re)connect to new monitors yet.
> > > > 
> > > > Signed-off-by: Venky Shankar <vshankar@xxxxxxxxxx>
> > > > ---
> > > >  fs/ceph/super.c | 7 +++++++
> > > >  1 file changed, 7 insertions(+)
> > > > 
> > > > diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> > > > index d8c6168b7fcd..d3a5a3729c5b 100644
> > > > --- a/fs/ceph/super.c
> > > > +++ b/fs/ceph/super.c
> > > > @@ -1268,6 +1268,13 @@ static int ceph_reconfigure_fc(struct fs_context *fc)
> > > >  	else
> > > >  		ceph_clear_mount_opt(fsc, ASYNC_DIROPS);
> > > >  
> > > > +	if (strcmp(fsc->mount_options->mon_addr, fsopt->mon_addr)) {
> > > > +		kfree(fsc->mount_options->mon_addr);
> > > > +		fsc->mount_options->mon_addr = fsopt->mon_addr;
> > > > +		fsopt->mon_addr = NULL;
> > > > +		printk(KERN_NOTICE "ceph: monitor addresses recorded, but not used for reconnection");
> > > 
> > > It's currently more in-vogue to use pr_notice() for this. I'll plan to
> > > make that (minor) change before I merge. No need to resend.
> > 
> > Yeah, this was the only comment I had too.  I saw some issues in the
> > previous revision but the changes to ceph_parse_source() seem to fix it in
> > this revision.
> > 
> > The other annoying thing I found isn't related with this patchset but with
> > a change that's been done some time ago by Xiubo (added to CC): it looks
> > like that if we have an invalid parameter (for example, wrong secret)
> > we'll always get -EHOSTUNREACH.
> > 
> > See below a possible fix (although I'm not entirely sure that's the correct
> > one).
> > 
> > Cheers,
> > --
> > Luís
> > 
> > From a988d24d8e72fc4933459f3dd5d303cbc9a566ed Mon Sep 17 00:00:00 2001
> > From: Luis Henriques <lhenriques@xxxxxxx>
> > Date: Wed, 14 Jul 2021 16:56:36 +0100
> > Subject: [PATCH] ceph: don't hide error code if we don't have mdsmap
> > 
> > Since commit 97820058fb28 ("ceph: check availability of mds cluster on mount
> > after wait timeout") we're returning -EHOSTUNREACH, even if the error isn't
> > related with the MDSs availability.  For example, we'll get it even if we're
> > trying to mounting a filesystem with an invalid username or secret.
> > 
> > Only return this error if we get -EIO.
> > 
> > Fixes: 97820058fb28 ("ceph: check availability of mds cluster on mount after wait timeout")
> > Signed-off-by: Luis Henriques <lhenriques@xxxxxxx>
> > ---
> >  fs/ceph/super.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> > index 086a1ceec9d8..67d70059ce9f 100644
> > --- a/fs/ceph/super.c
> > +++ b/fs/ceph/super.c
> > @@ -1230,7 +1230,8 @@ static int ceph_get_tree(struct fs_context *fc)
> >  	return 0;
> >  
> >  out_splat:
> > -	if (!ceph_mdsmap_is_cluster_available(fsc->mdsc->mdsmap)) {
> > +	if ((err == -EIO) &&
> > +	    !ceph_mdsmap_is_cluster_available(fsc->mdsc->mdsmap)) {
> >  		pr_info("No mds server is up or the cluster is laggy\n");
> >  		err = -EHOSTUNREACH;
> >  	}
> 
> Yeah, I've noticed that message pop up under all sorts of circumstances
> and it is an annoyance. I'm happy to consider such a patch if you send
> it separately.
> 
> That said, I'm honestly not sure this message is really helpful, and
> overriding errors like this at a high level seems sort of sketchy. Maybe
> we should just drop that message, or figure out a way to limit it to
> _just_ that situation.

I agree that the message isn't really useful but the -EHOSTUNREACH is a
bit more confusing if we get it in the mount command when we simply have a
typo in the parameters.

Anyway, after looking closer, I couldn't find a way to reach this code
and have -EHOSTUNREACH to make sense.  When the MDSs are down/unreachable
we have already err set to this error code.  This would mean that the
correct fix would be to simply drop this 'if' statement.  Xiubo, do you
remember how would this be possible?

Cheers,
--
Luís



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux