Re: [PATCH 2/3] ceph: add method that forces client to reconnect using new entity addr

Ilya Dryomov <idryomov@xxxxxxxxx> · Tue, 4 Jun 2019 11:37:27 +0200

On Mon, Jun 3, 2019 at 11:05 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
>
> On Mon, Jun 3, 2019 at 1:24 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> >
> > On Mon, Jun 3, 2019 at 1:07 PM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> > > Can we also discuss how useful is allowing to recover a mount after it
> > > has been blacklisted?  After we fail everything with EIO and throw out
> > > all dirty state, how many applications would continue working without
> > > some kind of restart?  And if you are restarting your application, why
> > > not get a new mount?
> > >
> > > IOW what is the use case for introducing a new debugfs knob that isn't
> > > that much different from umount+mount?
> >
> > People don't like it when their filesystem refuses to umount, which is
> > what happens when the kernel client can't reconnect to the MDS right
> > now. I'm not sure there's a practical way to deal with that besides
> > some kind of computer admin intervention.
>
> Furthermore, there are often many applications using the mount (even
> with containers) and it's not a sustainable position that any
> network/client/cephfs hiccup requires a remount. Also, an application

Well, it's not just any hiccup.  It's one that lead to blacklisting...

> that fails because of EIO is easy to deal with a layer above but a
> remount usually requires grump admin intervention.

I feel like I'm missing something here.  Would figuring out $ID,
obtaining root and echoing to /sys/kernel/debug/$ID/control make the
admin less grumpy, especially when containers are involved?

Doing the force_reconnect thing would retain the mount point, but how
much use would it be?  Would using existing (i.e. pre-blacklist) file
descriptors be allowed?  I assumed it wouldn't be (permanent EIO or
something of that sort), so maybe that is the piece I'm missing...

Thanks,

                Ilya