Re: [PATCH v3] ceph: dump info about cap flushes when we're waiting too long for them

Jeff Layton <jlayton@xxxxxxxxxx> · Fri, 30 Jul 2021 09:32:44 -0400

On Fri, 2021-07-30 at 11:09 +0100, Luis Henriques wrote:
> Jeff Layton <jlayton@xxxxxxxxxx> writes:
> 
> > We've had some cases of hung umounts in teuthology testing. It looks
> > like client is waiting for cap flushes to complete, but they aren't.
> > 
> > Add a field to the inode to track the highest cap flush tid seen for
> > that inode. Also, add a backpointer to the inode to the ceph_cap_flush
> > struct.
> > 
> > Change wait_caps_flush to wait 60s, and then dump info about the
> > condition of the list.
> > 
> > Also, print pr_info messages if we end up dropping a FLUSH_ACK for an
> > inode onto the floor.
> > 
> > Reported-by: Patrick Donnelly <pdonnell@xxxxxxxxxx>
> > URL: https://tracker.ceph.com/issues/51279
> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > ---
> >  fs/ceph/caps.c       | 17 +++++++++++++++--
> >  fs/ceph/inode.c      |  1 +
> >  fs/ceph/mds_client.c | 31 +++++++++++++++++++++++++++++--
> >  fs/ceph/super.h      |  2 ++
> >  4 files changed, 47 insertions(+), 4 deletions(-)
> > 
> > v3: more debugging has shown the client waiting on FLUSH_ACK messages
> >     that seem to never have come. Add some new printks if we end up
> >     dropping a FLUSH_ACK onto the floor.
> 
> Since you're adding debug printks, would it be worth to also add one in
> mds_dispatch(), when __verify_registered_session(mdsc, s) < 0?
> 
> It's a wild guess, but the FLUSH_ACK could be dropped in that case too.
> Not that I could spot any issue there, but since this seems to be
> happening during umount...
> 
> Cheers,

Good point. I had looked at that case and had sort of dismissed it in
this situation, but you're probably right. I've added a similar pr_info
for that case and pushed it to the repo after a little testing here. I
won't bother re-posting it though since the change is trivial.

Thanks,
-- 
Jeff Layton <jlayton@xxxxxxxxxx>