Re: [PATCH 2/3] ceph: add method that forces client to reconnect using new entity addr

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Thu, 6 Jun 2019 11:30:22 +0200

On Tue, Jun 4, 2019 at 4:10 AM Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>
> On Tue, Jun 4, 2019 at 5:18 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> >
> > On Mon, Jun 3, 2019 at 10:23 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Jun 3, 2019 at 1:07 PM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> > > > Can we also discuss how useful is allowing to recover a mount after it
> > > > has been blacklisted?  After we fail everything with EIO and throw out
> > > > all dirty state, how many applications would continue working without
> > > > some kind of restart?  And if you are restarting your application, why
> > > > not get a new mount?
> > > >
> > > > IOW what is the use case for introducing a new debugfs knob that isn't
> > > > that much different from umount+mount?
> > >
> > > People don't like it when their filesystem refuses to umount, which is
> > > what happens when the kernel client can't reconnect to the MDS right
> > > now. I'm not sure there's a practical way to deal with that besides
> > > some kind of computer admin intervention. (Even if you umount -l, that
> > > by design doesn't reply to syscalls and let the applications exit.)
> >
> > Well, that is what I'm saying: if an admin intervention is required
> > anyway, then why not make it be umount+mount?  That is certainly more
> > intuitive than an obscure write-only file in debugfs...
> >
>
> I think  'umount -f' + 'mount -o remount' is better than the debugfs file

A small bit of user input: for some of the places we'd like to use
CephFS we value availability over consistency.
For example, in a large batch processing farm, it is really
inconvenient (and expensive in lost CPU-hours) if an operator needs to
repair thousands of mounts when cephfs breaks (e.g. an mds crash or
whatever). It is preferential to let the apps crash, drop caches,
fh's, whatever else is necessary, and create a new session to the
cluster with the same mount. In this use-case, it doesn't matter if
the files were inconsistent, because a higher-level job scheduler will
retry the job from scratch somewhere else with new output files.
It would be nice if there was a mount option to allow users to choose
this mode (-o soft, for example). Without a mount option, we're forced
to run ugly cron jobs which look for hung mounts and do the necessary.

My 2c,

dan

>
>
> > We have umount -f, which is there for tearing down a mount that is
> > unresponsive.  It should be able to deal with a blacklisted mount, if
> > it can't it's probably a bug.
> >
> > Thanks,
> >
> >                 Ilya