On Fri, 2021-06-04 at 10:35 +0100, Luis Henriques wrote: > On Thu, Jun 03, 2021 at 12:57:22PM -0400, Jeff Layton wrote: > > On Thu, 2021-06-03 at 09:48 -0400, Jeff Layton wrote: > > > I've seen some warnings when testing recently that indicate that there > > > are caps still delayed on the delayed list even after we've started > > > unmounting. > > > > > > When checking delayed caps, process the whole list if we're unmounting, > > > and check for delayed caps after setting the stopping var and flushing > > > dirty caps. > > > > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > > --- > > > fs/ceph/caps.c | 3 ++- > > > fs/ceph/mds_client.c | 1 + > > > 2 files changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > > > index a5e93b185515..68b4c6dfe4db 100644 > > > --- a/fs/ceph/caps.c > > > +++ b/fs/ceph/caps.c > > > @@ -4236,7 +4236,8 @@ void ceph_check_delayed_caps(struct ceph_mds_client *mdsc) > > > ci = list_first_entry(&mdsc->cap_delay_list, > > > struct ceph_inode_info, > > > i_cap_delay_list); > > > - if ((ci->i_ceph_flags & CEPH_I_FLUSH) == 0 && > > > + if (!mdsc->stopping && > > > + (ci->i_ceph_flags & CEPH_I_FLUSH) == 0 && > > > time_before(jiffies, ci->i_hold_caps_max)) > > > break; > > > list_del_init(&ci->i_cap_delay_list); > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > > > index e5af591d3bd4..916af5497829 100644 > > > --- a/fs/ceph/mds_client.c > > > +++ b/fs/ceph/mds_client.c > > > @@ -4691,6 +4691,7 @@ void ceph_mdsc_pre_umount(struct ceph_mds_client *mdsc) > > > > > > lock_unlock_sessions(mdsc); > > > ceph_flush_dirty_caps(mdsc); > > > + ceph_check_delayed_caps(mdsc); > > > wait_requests(mdsc); > > > > > > /* > > > > I'm going to self-NAK this patch for now. Initially this looked good in > > testing, but I think it's just papering over the real problem, which is > > that ceph_async_iput can queue a job to a workqueue after the point > > where we've flushed that workqueue on umount. > > Ah, yeah. I think I saw this a few times with generic/014 (and I believe > we chatted about it on irc). I've been on and off trying to figure out > the way to fix it but it's really tricky. > Yeah, that's putting it mildly. The biggest issue here is the session->s_mutex, which is held over large swaths of the code, but it's not fully clear what it protects. The original patch that added ceph_async_iput did it to avoid the session mutex that gets held for ceph_iterate_session_caps. My current thinking is that we probably don't need to hold the session mutex over that function in some cases, if we can guarantee that the ceph_cap objects we're iterating over don't go away when the lock is dropped. So, I'm trying to add some refcounting to the ceph_cap structures themselves to see if that helps. It may turn out to be a dead end, but if we don't chip away at the edges of the fundamental problem, we'll never get there... -- Jeff Layton <jlayton@xxxxxxxxxx>