On Tue, 2019-06-25 at 06:31 +0800, Yan, Zheng wrote: > On Tue, Jun 25, 2019 at 5:18 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > On Sun, 2019-06-23 at 20:20 -0700, Patrick Donnelly wrote: > > > On Sun, Jun 23, 2019 at 6:50 PM Yan, Zheng <zyan@xxxxxxxxxx> wrote: > > > > On 6/22/19 12:48 AM, Jeff Layton wrote: > > > > > On Fri, 2019-06-21 at 16:10 +0800, Yan, Zheng wrote: > > > > > > On 6/20/19 11:33 PM, Jeff Layton wrote: > > > > > > > On Wed, 2019-06-19 at 08:24 +0800, Yan, Zheng wrote: > > > > > > > > On Tue, Jun 18, 2019 at 6:39 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > > > > On Tue, 2019-06-18 at 14:25 +0800, Yan, Zheng wrote: > > > > > > > > > > On 6/18/19 1:30 AM, Jeff Layton wrote: > > > > > > > > > > > On Mon, 2019-06-17 at 20:55 +0800, Yan, Zheng wrote: > > > > > > > > > > > > When remounting aborted mount, also reset client's entity addr. > > > > > > > > > > > > 'umount -f /ceph; mount -o remount /ceph' can be used for recovering > > > > > > > > > > > > from blacklist. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Why do I need to umount here? Once the filesystem is unmounted, then the > > > > > > > > > > > '-o remount' becomes superfluous, no? In fact, I get an error back when > > > > > > > > > > > I try to remount an unmounted filesystem: > > > > > > > > > > > > > > > > > > > > > > $ sudo umount -f /mnt/cephfs ; sudo mount -o remount /mnt/cephfs > > > > > > > > > > > mount: /mnt/cephfs: mount point not mounted or bad option. > > > > > > > > > > > > > > > > > > > > > > My client isn't blacklisted above, so I guess you're counting on the > > > > > > > > > > > umount returning without having actually unmounted the filesystem? > > > > > > > > > > > > > > > > > > > > > > I think this ought to not need a umount first. From a UI standpoint, > > > > > > > > > > > just doing a "mount -o remount" ought to be sufficient to clear this. > > > > > > > > > > > > > > > > > > > > > This series is mainly for the case that mount point is not umountable. > > > > > > > > > > If mount point is umountable, user should use 'umount -f /ceph; mount > > > > > > > > > > /ceph'. This avoids all trouble of error handling. > > > > > > > > > > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > If just doing "mount -o remount", user will expect there is no > > > > > > > > > > data/metadata get lost. The 'mount -f' explicitly tell user this > > > > > > > > > > operation may lose data/metadata. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I don't think they'd expect that and even if they did, that's why we'd > > > > > > > > > return errors on certain operations until they are cleared. But, I think > > > > > > > > > all of this points out the main issue I have with this patchset, which > > > > > > > > > is that it's not clear what problem this is solving. > > > > > > > > > > > > > > > > > > So: client gets blacklisted and we want to allow it to come back in some > > > > > > > > > fashion. Do we expect applications that happened to be accessing that > > > > > > > > > mount to be able to continue running, or will they need to be restarted? > > > > > > > > > If they need to be restarted why not just expect the admin to kill them > > > > > > > > > all off, unmount and remount and then start them back up again? > > > > > > > > > > > > > > > > > > > > > > > > > The point is let users decide what to do. Some user values > > > > > > > > availability over consistency. It's inconvenient to kill all > > > > > > > > applications that use the mount, then do umount. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think I have a couple of issues with this patchset. Maybe you can > > > > > > > convince me though: > > > > > > > > > > > > > > 1) The interface is really weird. > > > > > > > > > > > > > > You suggested that we needed to do: > > > > > > > > > > > > > > # umount -f /mnt/foo ; mount -o remount /mnt/foo > > > > > > > > > > > > > > ...but what if I'm not really blacklisted? Didn't I just kill off all > > > > > > > the calls in-flight with the umount -f? What if that umount actually > > > > > > > succeeds? Then the subsequent remount call will fail. > > > > > > > > > > > > > > ISTM, that this interface (should we choose to accept it) should just > > > > > > > be: > > > > > > > > > > > > > > # mount -o remount /mnt/foo > > > > > > > > > > > > > > > > > > > I have patch that does > > > > > > > > > > > > mount -o remount,force_reconnect /mnt/ceph > > > > > > > > > > > > > > > > > > > > > > That seems clearer. > > > > > > > > > > > > ...and if the client figures out that it has been blacklisted, then it > > > > > > > does the right thing during the remount (whatever that right thing is). > > > > > > > > > > > > > > 2) It's not clear to me who we expect to use this. > > > > > > > > > > > > > > Are you targeting applications that do not use file locking? Any that do > > > > > > > use file locking will probably need some special handling, but those > > > > > > > that don't might be able to get by unscathed as long as they can deal > > > > > > > with -EIO on fsync by replaying writes since the last fsync. > > > > > > > > > > > > > > > > > > > Several users said they availability over consistency. For example: > > > > > > ImageNet training, cephfs is used for storing image files. > > > > > > > > > > > > > > > > > > > > > > Which sounds reasonable on its face...but why bother with remounting at > > > > > that point? Why not just have the client reattempt connections until it > > > > > succeeds (or you forcibly unmount). > > > > > > > > > > For that matter, why not just redirty the pages after the writes fail in > > > > > that case instead of forcing those users to rewrite their data? If they > > > > > don't care about consistency that much, then that would seem to be a > > > > > nicer way to deal with this. > > > > > > > > > > > > > I'm not clear about this either > > > > > > As I've said elsewhere: **how** the client recovers from the lost > > > session and blacklist event is configurable. There should be a range > > > of mount options which control the behavior: such as a _hypothetical_ > > > "recover_session=<mode>", where mode may be: > > > > > > - "brute": re-acquire capabilities and flush all dirty data. All open > > > file handles continue to work normally. Dangerous and definitely not > > > the default. (How should file locks be handled?) > > > > > > > IMO, just reacquire them as if nothing happened for this mode. I see > > this as conceptually similar to recover_lost_locks module parameter in > > nfs.ko. That said, we will need to consider what to do if the lock can't > > be reacquired in this mode. > > > > > - "clean": re-acquire read capabilities and drop dirty write buffers. > > > Writable file handles return -EIO. Locks are lost and the lock owners > > > are sent SIGIO, si_code=SI_LOST, si_fd=lockedfd (default is > > > termination!). Read-only handles continue to work and caches are > > > dropped if necessary. This should probably be the default. > > > > > > > Sounds good, except for maybe modulo SIGLOST handling for reasons I > > outlined in another thread. > > > > > - "fresh": like "clean" but read-only handles also return -EIO. Not > > > sure if this one is useful but not difficult to add. > > > > > > > Meh, maybe. If we don't clearly need it then let's not add it. I'd want > > to know that someone has an actual use for this option. Adding > > interfaces just because we can, just makes trouble later as the code > > ages. > > > > > No "-o remount" mount commands necessary. > > > > > > Now, these details are open for change. I'm just trying to suggest a > > > way forward. I'm not well versed in how difficult this proposal is to > > > implement in the kernel. There are probably details or challenges I'm > > > not considering. I recommend that before Zheng writes new code that he > > > and Jeff work out what the right semantics and configurations should > > > be and make a proposal to ceph-devel/dev@xxxxxxx for user feedback. > > > > > > > That sounds a bit more reasonable. I'd prefer not having to wait for > > admin intervention in order to get things moving again if the goal is > > making things more available. > > > > That said, whenever we're doing something like this, it's easy for all > > of us to make subtle assumptions and end up talking at cross-purposes to > > one another. The first step here is to clearly identify the problem > > we're trying to solve. From earlier emails I'd suggest this as a > > starting point: > > > > "Clients can end up blacklisted due to various connectivity issues, and > > we'd like to offer admins a way to configure the mount to reconnect > > after blacklisting/unblacklisting, and continue working. Preferably, > > with no disruption to the application other than the client hanging > > while blacklisted." > > > > Does this sound about right? > > > > If so, then I think we ought to aim for something closer to what Patrick > > is suggesting; a mount option or something that causes the cephfs client > > to aggressively attempt to recover after being unblacklisted. > > > > Clients shouldn't be too aggressively in this case. Otherwise they can > easily create too many blacklist entries in osdmap. > When I said "aggressively" I meant on the order of once a minute or so, though that interval could be tunable. Can blacklisted clients still request osd maps from the monitors? IOW, is there a way for the client to determine whether it has been blacklisted? If so, then when the client suspects that it has been blacklisted it could just wait until the new OSD map shows otherwise. In any case, I thought blacklisting mostly occurred when clients fail to give up their MDS caps. Why would repeated polling create more blacklist entries? -- Jeff Layton <jlayton@xxxxxxxxxx>