On Mon, 2019-07-08 at 16:43 +0800, Yan, Zheng wrote: > On Fri, Jul 5, 2019 at 9:22 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > On Fri, 2019-07-05 at 19:26 +0800, Yan, Zheng wrote: > > > On Fri, Jul 5, 2019 at 6:16 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > On Fri, 2019-07-05 at 09:17 +0800, Yan, Zheng wrote: > > > > > On Thu, Jul 4, 2019 at 10:30 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > On Thu, 2019-07-04 at 09:30 +0800, Yan, Zheng wrote: > > > > > > > On 7/4/19 12:01 AM, Jeff Layton wrote: > > > > > > > > On Wed, 2019-07-03 at 20:44 +0800, Yan, Zheng wrote: > > > > > > > > > This series add support for auto reconnect after blacklisted. > > > > > > > > > > > > > > > > > > Auto reconnect is controlled by recover_session=<clean|no> mount option. > > > > > > > > > Clean mode is enabled by default. In this mode, client drops dirty date > > > > > > > > > and dirty metadata, All writable file handles are invalidated. Read-only > > > > > > > > > file handles continue to work and caches are dropped if necessary. > > > > > > > > > If an inode contains any lost file lock, read and write are not allowed. > > > > > > > > > until all lost file locks are released. > > > > > > > > > > > > > > > > Just giving this a quick glance: > > > > > > > > > > > > > > > > Based on the last email discussion about this, I thought that you were > > > > > > > > going to provide a mount option that someone could enable that would > > > > > > > > basically allow the client to "soldier on" in the face of being > > > > > > > > blacklisted and then unblacklisted, without needing to remount anything. > > > > > > > > > > > > > > > > This set seems to keep the force_reconnect option (patch #7) though, so > > > > > > > > I'm quite confused at this point. What exactly is the goal of here? > > > > > > > > > > > > > > > > > > > > > > because auto reconnect can be disabled, force_reconnect is the manual > > > > > > > way to fix blacklistd mount. > > > > > > > > > > > > > > > > > > > Why not instead allow remounting with a different recover_session= mode? > > > > > > Then you wouldn't need this option that's only valid during a remount. > > > > > > That seems like a more natural way to use a new mount option. > > > > > > > > > > > > > > > > you mean something like 'recover_session=now' for remount? > > > > > > > > > > > > > > > > > > No, I meant something like: > > > > > > > > -o remount,recover_session=brute > > > > > > > > > > This is confusing. user may just want to change auto reconnect mode > > > for backlist event in the future, does not want to force reconnect. > > > > > > > Why do we need to allow the admin to manually force a reconnect? If you > > (hypothetically) change the mode to "brute" then it should do it on its > > own when it detects that it's in this situation, no? > > > > First, auto reconnect is limited to once every 30 seconds. Second, > client may fail to detect that itself is blacklisted. So I think we > still need a way to force client to reconnect > How does it detect that it has been blacklisted? Does it do that by looking at the OSD maps? I'd like to better understand how the client would recognize this automatically and why it might miss it. If we do end up needing some sort of control to forcibly reconnect, then it seems like we might be better off with something besides a mount option. sysfs control file maybe? > > > > IOW, allow the admin to just change the mount to use a different > > > > recover_session= mode once things are stuck. > > > > > > > > > > > > There's also nothing in the changelogs or comments about > > > > > > > > recover_session=brute, which seems like it ought to at least be > > > > > > > > mentioned. > > > > > > > > > > > > > > brute code is not enabled yet > > > > > > > > > > > > Got it -- I missed that that the mount option for it was commented out. > > > > > > > > > > > > Given that this is a user interface change, I think it'd be best to not > > > > > > merge merge this until it's complete. Otherwise we'll have to deal with > > > > > > intermediate kernel versions that don't implement some parts of the new > > > > > > interface. That's makes it more difficult for admins to use (and for us > > > > > > to document). > > > > > > > > > > > > > > At this point, I'm going to say NAK on this set until there is some > > > > > > > > accompanying documentation about how you intend for this be used and by > > > > > > > > whom. A patch for the mount.ceph(8) manpage would be a good place to > > > > > > > > start. > > > > > > > > > > > > > > > > > Yan, Zheng (9): > > > > > > > > > libceph: add function that reset client's entity addr > > > > > > > > > libceph: add function that clears osd client's abort_err > > > > > > > > > ceph: allow closing session in restarting/reconnect state > > > > > > > > > ceph: track and report error of async metadata operation > > > > > > > > > ceph: pass filp to ceph_get_caps() > > > > > > > > > ceph: return -EIO if read/write against filp that lost file locks > > > > > > > > > ceph: add 'force_reconnect' option for remount > > > > > > > > > ceph: invalidate all write mode filp after reconnect > > > > > > > > > ceph: auto reconnect after blacklisted > > > > > > > > > > > > > > > > > > fs/ceph/addr.c | 30 +++++++---- > > > > > > > > > fs/ceph/caps.c | 84 ++++++++++++++++++++---------- > > > > > > > > > fs/ceph/file.c | 50 ++++++++++-------- > > > > > > > > > fs/ceph/inode.c | 2 + > > > > > > > > > fs/ceph/locks.c | 8 ++- > > > > > > > > > fs/ceph/mds_client.c | 92 ++++++++++++++++++++++++++------- > > > > > > > > > fs/ceph/mds_client.h | 6 +-- > > > > > > > > > fs/ceph/super.c | 91 ++++++++++++++++++++++++++++++-- > > > > > > > > > fs/ceph/super.h | 23 +++++++-- > > > > > > > > > include/linux/ceph/libceph.h | 1 + > > > > > > > > > include/linux/ceph/messenger.h | 1 + > > > > > > > > > include/linux/ceph/mon_client.h | 1 + > > > > > > > > > include/linux/ceph/osd_client.h | 2 + > > > > > > > > > net/ceph/ceph_common.c | 38 +++++++++----- > > > > > > > > > net/ceph/messenger.c | 5 ++ > > > > > > > > > net/ceph/mon_client.c | 7 +++ > > > > > > > > > net/ceph/osd_client.c | 24 +++++++++ > > > > > > > > > 17 files changed, 365 insertions(+), 100 deletions(-) > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Jeff Layton <jlayton@xxxxxxxxxx> > > > > > > > > > > > > > > -- > > > > Jeff Layton <jlayton@xxxxxxxxxx> > > > > > > > > -- > > Jeff Layton <jlayton@xxxxxxxxxx> > > -- Jeff Layton <jlayton@xxxxxxxxxx>