Re: [PATCH 4/8] ceph: allow remounting aborted mount

Patrick Donnelly <pdonnell@xxxxxxxxxx> · Sun, 23 Jun 2019 20:20:51 -0700

On Sun, Jun 23, 2019 at 6:50 PM Yan, Zheng <zyan@xxxxxxxxxx> wrote:
>
> On 6/22/19 12:48 AM, Jeff Layton wrote:
> > On Fri, 2019-06-21 at 16:10 +0800, Yan, Zheng wrote:
> >> On 6/20/19 11:33 PM, Jeff Layton wrote:
> >>> On Wed, 2019-06-19 at 08:24 +0800, Yan, Zheng wrote:
> >>>> On Tue, Jun 18, 2019 at 6:39 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> >>>>> On Tue, 2019-06-18 at 14:25 +0800, Yan, Zheng wrote:
> >>>>>> On 6/18/19 1:30 AM, Jeff Layton wrote:
> >>>>>>> On Mon, 2019-06-17 at 20:55 +0800, Yan, Zheng wrote:
> >>>>>>>> When remounting aborted mount, also reset client's entity addr.
> >>>>>>>> 'umount -f /ceph; mount -o remount /ceph' can be used for recovering
> >>>>>>>> from blacklist.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Why do I need to umount here? Once the filesystem is unmounted, then the
> >>>>>>> '-o remount' becomes superfluous, no? In fact, I get an error back when
> >>>>>>> I try to remount an unmounted filesystem:
> >>>>>>>
> >>>>>>>        $ sudo umount -f /mnt/cephfs ; sudo mount -o remount /mnt/cephfs
> >>>>>>>        mount: /mnt/cephfs: mount point not mounted or bad option.
> >>>>>>>
> >>>>>>> My client isn't blacklisted above, so I guess you're counting on the
> >>>>>>> umount returning without having actually unmounted the filesystem?
> >>>>>>>
> >>>>>>> I think this ought to not need a umount first. From a UI standpoint,
> >>>>>>> just doing a "mount -o remount" ought to be sufficient to clear this.
> >>>>>>>
> >>>>>> This series is mainly for the case that mount point is not umountable.
> >>>>>> If mount point is umountable, user should use 'umount -f /ceph; mount
> >>>>>> /ceph'. This avoids all trouble of error handling.
> >>>>>>
> >>>>>
> >>>>> ...
> >>>>>
> >>>>>> If just doing "mount -o remount", user will expect there is no
> >>>>>> data/metadata get lost.  The 'mount -f' explicitly tell user this
> >>>>>> operation may lose data/metadata.
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> I don't think they'd expect that and even if they did, that's why we'd
> >>>>> return errors on certain operations until they are cleared. But, I think
> >>>>> all of this points out the main issue I have with this patchset, which
> >>>>> is that it's not clear what problem this is solving.
> >>>>>
> >>>>> So: client gets blacklisted and we want to allow it to come back in some
> >>>>> fashion. Do we expect applications that happened to be accessing that
> >>>>> mount to be able to continue running, or will they need to be restarted?
> >>>>> If they need to be restarted why not just expect the admin to kill them
> >>>>> all off, unmount and remount and then start them back up again?
> >>>>>
> >>>>
> >>>> The point is let users decide what to do. Some user values
> >>>> availability over consistency. It's inconvenient to kill all
> >>>> applications that use the mount, then do umount.
> >>>>
> >>>>
> >>>
> >>> I think I have a couple of issues with this patchset. Maybe you can
> >>> convince me though:
> >>>
> >>> 1) The interface is really weird.
> >>>
> >>> You suggested that we needed to do:
> >>>
> >>>       # umount -f /mnt/foo ; mount -o remount /mnt/foo
> >>>
> >>> ...but what if I'm not really blacklisted? Didn't I just kill off all
> >>> the calls in-flight with the umount -f? What if that umount actually
> >>> succeeds? Then the subsequent remount call will fail.
> >>>
> >>> ISTM, that this interface (should we choose to accept it) should just
> >>> be:
> >>>
> >>>       # mount -o remount /mnt/foo
> >>>
> >>
> >> I have patch that does
> >>
> >> mount -o remount,force_reconnect /mnt/ceph
> >>
> >>
> >
> > That seems clearer.
> >
> >>> ...and if the client figures out that it has been blacklisted, then it
> >>> does the right thing during the remount (whatever that right thing is).
> >>>
> >>> 2) It's not clear to me who we expect to use this.
> >>>
> >>> Are you targeting applications that do not use file locking? Any that do
> >>> use file locking will probably need some special handling, but those
> >>> that don't might be able to get by unscathed as long as they can deal
> >>> with -EIO on fsync by replaying writes since the last fsync.
> >>>
> >>
> >> Several users said they availability over consistency. For example:
> >> ImageNet training, cephfs is used for storing image files.
> >>
> >>
> >
> > Which sounds reasonable on its face...but why bother with remounting at
> > that point? Why not just have the client reattempt connections until it
> > succeeds (or you forcibly unmount).
> >
> > For that matter, why not just redirty the pages after the writes fail in
> > that case instead of forcing those users to rewrite their data? If they
> > don't care about consistency that much, then that would seem to be a
> > nicer way to deal with this.
> >
>
> I'm not clear about this either

As I've said elsewhere: **how** the client recovers from the lost
session and blacklist event is configurable. There should be a range
of mount options which control the behavior: such as a _hypothetical_
"recover_session=<mode>", where mode may be:

- "brute": re-acquire capabilities and flush all dirty data. All open
file handles continue to work normally. Dangerous and definitely not
the default. (How should file locks be handled?)

- "clean": re-acquire read capabilities and drop dirty write buffers.
Writable file handles return -EIO. Locks are lost and the lock owners
are sent SIGIO, si_code=SI_LOST, si_fd=lockedfd (default is
termination!). Read-only handles continue to work and caches are
dropped if necessary. This should probably be the default.

- "fresh": like "clean" but read-only handles also return -EIO. Not
sure if this one is useful but not difficult to add.

No "-o remount" mount commands necessary.

Now, these details are open for change. I'm just trying to suggest a
way forward. I'm not well versed in how difficult this proposal is to
implement in the kernel. There are probably details or challenges I'm
not considering. I recommend that before Zheng writes new code that he
and Jeff work out what the right semantics and configurations should
be and make a proposal to ceph-devel/dev@xxxxxxx for user feedback.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D