Re: [PATCH 4/8] ceph: allow remounting aborted mount

"Yan, Zheng" <ukernel@xxxxxxxxx> · Fri, 21 Jun 2019 10:59:20 +0800

On Fri, Jun 21, 2019 at 1:11 AM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
>
> On Thu, Jun 20, 2019 at 8:34 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> >
> > On Wed, 2019-06-19 at 08:24 +0800, Yan, Zheng wrote:
> > > On Tue, Jun 18, 2019 at 6:39 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > > > On Tue, 2019-06-18 at 14:25 +0800, Yan, Zheng wrote:
> > > > > On 6/18/19 1:30 AM, Jeff Layton wrote:
> > > > > > On Mon, 2019-06-17 at 20:55 +0800, Yan, Zheng wrote:
> > > > > > > When remounting aborted mount, also reset client's entity addr.
> > > > > > > 'umount -f /ceph; mount -o remount /ceph' can be used for recovering
> > > > > > > from blacklist.
> > > > > > >
> > > > > >
> > > > > > Why do I need to umount here? Once the filesystem is unmounted, then the
> > > > > > '-o remount' becomes superfluous, no? In fact, I get an error back when
> > > > > > I try to remount an unmounted filesystem:
> > > > > >
> > > > > >      $ sudo umount -f /mnt/cephfs ; sudo mount -o remount /mnt/cephfs
> > > > > >      mount: /mnt/cephfs: mount point not mounted or bad option.
> > > > > >
> > > > > > My client isn't blacklisted above, so I guess you're counting on the
> > > > > > umount returning without having actually unmounted the filesystem?
> > > > > >
> > > > > > I think this ought to not need a umount first. From a UI standpoint,
> > > > > > just doing a "mount -o remount" ought to be sufficient to clear this.
> > > > > >
> > > > > This series is mainly for the case that mount point is not umountable.
> > > > > If mount point is umountable, user should use 'umount -f /ceph; mount
> > > > > /ceph'. This avoids all trouble of error handling.
> > > > >
> > > >
> > > > ...
> > > >
> > > > > If just doing "mount -o remount", user will expect there is no
> > > > > data/metadata get lost.  The 'mount -f' explicitly tell user this
> > > > > operation may lose data/metadata.
> > > > >
> > > > >
> > > >
> > > > I don't think they'd expect that and even if they did, that's why we'd
> > > > return errors on certain operations until they are cleared. But, I think
> > > > all of this points out the main issue I have with this patchset, which
> > > > is that it's not clear what problem this is solving.
> > > >
> > > > So: client gets blacklisted and we want to allow it to come back in some
> > > > fashion. Do we expect applications that happened to be accessing that
> > > > mount to be able to continue running, or will they need to be restarted?
> > > > If they need to be restarted why not just expect the admin to kill them
> > > > all off, unmount and remount and then start them back up again?
> > > >
> > >
> > > The point is let users decide what to do. Some user values
> > > availability over consistency. It's inconvenient to kill all
> > > applications that use the mount, then do umount.
> > >
> > >
> >
> > I think I have a couple of issues with this patchset. Maybe you can
> > convince me though:
> >
> > 1) The interface is really weird.
> >
> > You suggested that we needed to do:
> >
> >     # umount -f /mnt/foo ; mount -o remount /mnt/foo
> >
> > ...but what if I'm not really blacklisted? Didn't I just kill off all
> > the calls in-flight with the umount -f? What if that umount actually
> > succeeds? Then the subsequent remount call will fail.
> >
> > ISTM, that this interface (should we choose to accept it) should just
> > be:
> >
> >     # mount -o remount /mnt/foo
> >
> > ...and if the client figures out that it has been blacklisted, then it
> > does the right thing during the remount (whatever that right thing is).
>
> This looks reasonable to me. It's not clear to me (without poring over
> the code which I lack time to do rigorously) why the umount should be
> necessary at all.
>
> Furthermore, I don't like that this is requiring operator intervention
> (i.e. remount) of any kind to recover the mount. If undesirable
> consistency/cache coherency concerns are what's stopping us from
> automatic recovery,

if admin want clients to auto recover. why enabling blacklist on eviction
at the first place.

> then I propose we make client recovery
> configurable with mount options and sensible defaults. For example,
> the default might be to cause all open file handles to return EIO for
> any operation. (Would such a thing even be easily doable within the
> Linux kernel? It's not clear to me if Linux file system drivers can
> somehow invalidate application file handles in this way.)
>

> Another config option would be to allow file open for reading to
> continue functioning but disable files open for writing (and drop all
> buffered dirty pages).
>
> Finally: another config to require operator to remount the file system.
>
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Senior Software Engineer
> Red Hat Sunnyvale, CA
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D