On Sun, 2019-06-23 at 20:20 -0700, Patrick Donnelly wrote:
On Sun, Jun 23, 2019 at 6:50 PM Yan, Zheng <zyan@xxxxxxxxxx> wrote:
On 6/22/19 12:48 AM, Jeff Layton wrote:
On Fri, 2019-06-21 at 16:10 +0800, Yan, Zheng wrote:
On 6/20/19 11:33 PM, Jeff Layton wrote:
On Wed, 2019-06-19 at 08:24 +0800, Yan, Zheng wrote:
On Tue, Jun 18, 2019 at 6:39 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
On Tue, 2019-06-18 at 14:25 +0800, Yan, Zheng wrote:
On 6/18/19 1:30 AM, Jeff Layton wrote:
On Mon, 2019-06-17 at 20:55 +0800, Yan, Zheng wrote:
When remounting aborted mount, also reset client's entity addr.
'umount -f /ceph; mount -o remount /ceph' can be used for recovering
from blacklist.
Why do I need to umount here? Once the filesystem is unmounted, then the
'-o remount' becomes superfluous, no? In fact, I get an error back when
I try to remount an unmounted filesystem:
$ sudo umount -f /mnt/cephfs ; sudo mount -o remount /mnt/cephfs
mount: /mnt/cephfs: mount point not mounted or bad option.
My client isn't blacklisted above, so I guess you're counting on the
umount returning without having actually unmounted the filesystem?
I think this ought to not need a umount first. From a UI standpoint,
just doing a "mount -o remount" ought to be sufficient to clear this.
This series is mainly for the case that mount point is not umountable.
If mount point is umountable, user should use 'umount -f /ceph; mount
/ceph'. This avoids all trouble of error handling.
...
If just doing "mount -o remount", user will expect there is no
data/metadata get lost. The 'mount -f' explicitly tell user this
operation may lose data/metadata.
I don't think they'd expect that and even if they did, that's why we'd
return errors on certain operations until they are cleared. But, I think
all of this points out the main issue I have with this patchset, which
is that it's not clear what problem this is solving.
So: client gets blacklisted and we want to allow it to come back in some
fashion. Do we expect applications that happened to be accessing that
mount to be able to continue running, or will they need to be restarted?
If they need to be restarted why not just expect the admin to kill them
all off, unmount and remount and then start them back up again?
The point is let users decide what to do. Some user values
availability over consistency. It's inconvenient to kill all
applications that use the mount, then do umount.
I think I have a couple of issues with this patchset. Maybe you can
convince me though:
1) The interface is really weird.
You suggested that we needed to do:
# umount -f /mnt/foo ; mount -o remount /mnt/foo
...but what if I'm not really blacklisted? Didn't I just kill off all
the calls in-flight with the umount -f? What if that umount actually
succeeds? Then the subsequent remount call will fail.
ISTM, that this interface (should we choose to accept it) should just
be:
# mount -o remount /mnt/foo
I have patch that does
mount -o remount,force_reconnect /mnt/ceph
That seems clearer.
...and if the client figures out that it has been blacklisted, then it
does the right thing during the remount (whatever that right thing is).
2) It's not clear to me who we expect to use this.
Are you targeting applications that do not use file locking? Any that do
use file locking will probably need some special handling, but those
that don't might be able to get by unscathed as long as they can deal
with -EIO on fsync by replaying writes since the last fsync.
Several users said they availability over consistency. For example:
ImageNet training, cephfs is used for storing image files.
Which sounds reasonable on its face...but why bother with remounting at
that point? Why not just have the client reattempt connections until it
succeeds (or you forcibly unmount).
For that matter, why not just redirty the pages after the writes fail in
that case instead of forcing those users to rewrite their data? If they
don't care about consistency that much, then that would seem to be a
nicer way to deal with this.
I'm not clear about this either
As I've said elsewhere: **how** the client recovers from the lost
session and blacklist event is configurable. There should be a range
of mount options which control the behavior: such as a _hypothetical_
"recover_session=<mode>", where mode may be:
- "brute": re-acquire capabilities and flush all dirty data. All open
file handles continue to work normally. Dangerous and definitely not
the default. (How should file locks be handled?)
IMO, just reacquire them as if nothing happened for this mode. I see
this as conceptually similar to recover_lost_locks module parameter in
nfs.ko. That said, we will need to consider what to do if the lock can't
be reacquired in this mode.
- "clean": re-acquire read capabilities and drop dirty write buffers.
Writable file handles return -EIO. Locks are lost and the lock owners
are sent SIGIO, si_code=SI_LOST, si_fd=lockedfd (default is
termination!). Read-only handles continue to work and caches are
dropped if necessary. This should probably be the default.
Sounds good, except for maybe modulo SIGLOST handling for reasons I
outlined in another thread.
- "fresh": like "clean" but read-only handles also return -EIO. Not
sure if this one is useful but not difficult to add.
Meh, maybe. If we don't clearly need it then let's not add it. I'd want
to know that someone has an actual use for this option. Adding
interfaces just because we can, just makes trouble later as the code
ages.
No "-o remount" mount commands necessary.
Now, these details are open for change. I'm just trying to suggest a
way forward. I'm not well versed in how difficult this proposal is to
implement in the kernel. There are probably details or challenges I'm
not considering. I recommend that before Zheng writes new code that he
and Jeff work out what the right semantics and configurations should
be and make a proposal to ceph-devel/dev@xxxxxxx for user feedback.
That sounds a bit more reasonable. I'd prefer not having to wait for
admin intervention in order to get things moving again if the goal is
making things more available.
That said, whenever we're doing something like this, it's easy for all
of us to make subtle assumptions and end up talking at cross-purposes to
one another. The first step here is to clearly identify the problem
we're trying to solve. From earlier emails I'd suggest this as a
starting point:
"Clients can end up blacklisted due to various connectivity issues, and
we'd like to offer admins a way to configure the mount to reconnect
after blacklisting/unblacklisting, and continue working. Preferably,
with no disruption to the application other than the client hanging
while blacklisted."
Does this sound about right?
If so, then I think we ought to aim for something closer to what Patrick
is suggesting; a mount option or something that causes the cephfs client
to aggressively attempt to recover after being unblacklisted.