On Fri, 2021-07-30 at 11:09 +0100, Luis Henriques wrote: > Jeff Layton <jlayton@xxxxxxxxxx> writes: > > > We've had some cases of hung umounts in teuthology testing. It looks > > like client is waiting for cap flushes to complete, but they aren't. > > > > Add a field to the inode to track the highest cap flush tid seen for > > that inode. Also, add a backpointer to the inode to the ceph_cap_flush > > struct. > > > > Change wait_caps_flush to wait 60s, and then dump info about the > > condition of the list. > > > > Also, print pr_info messages if we end up dropping a FLUSH_ACK for an > > inode onto the floor. > > > > Reported-by: Patrick Donnelly <pdonnell@xxxxxxxxxx> > > URL: https://tracker.ceph.com/issues/51279 > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > --- > > fs/ceph/caps.c | 17 +++++++++++++++-- > > fs/ceph/inode.c | 1 + > > fs/ceph/mds_client.c | 31 +++++++++++++++++++++++++++++-- > > fs/ceph/super.h | 2 ++ > > 4 files changed, 47 insertions(+), 4 deletions(-) > > > > v3: more debugging has shown the client waiting on FLUSH_ACK messages > > that seem to never have come. Add some new printks if we end up > > dropping a FLUSH_ACK onto the floor. > > Since you're adding debug printks, would it be worth to also add one in > mds_dispatch(), when __verify_registered_session(mdsc, s) < 0? > > It's a wild guess, but the FLUSH_ACK could be dropped in that case too. > Not that I could spot any issue there, but since this seems to be > happening during umount... > > Cheers, Good point. I had looked at that case and had sort of dismissed it in this situation, but you're probably right. I've added a similar pr_info for that case and pushed it to the repo after a little testing here. I won't bother re-posting it though since the change is trivial. Thanks, -- Jeff Layton <jlayton@xxxxxxxxxx>