On Tue, 2022-06-07 at 09:50 +0800, Xiubo Li wrote: > On 6/7/22 9:21 AM, Jeff Layton wrote: > > On Tue, 2022-06-07 at 09:11 +0800, Xiubo Li wrote: > > > On 6/7/22 7:31 AM, Jeff Layton wrote: > > > > Currently, we'll call ceph_check_caps, but if we're still waiting on the > > > > reply, we'll end up spinning around on the same inode in > > > > flush_dirty_session_caps. Wait for the async create reply before > > > > flushing caps. > > > > > > > > Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages) > > > > URL: https://tracker.ceph.com/issues/55823 > > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > > > --- > > > > fs/ceph/caps.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > I don't know if this will fix the tx queue stalls completely, but I > > > > haven't seen one with this patch in place. I think it makes sense on its > > > > own, either way. > > > > > > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > > > > index 0a48bf829671..5ecfff4b37c9 100644 > > > > --- a/fs/ceph/caps.c > > > > +++ b/fs/ceph/caps.c > > > > @@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s) > > > > ihold(inode); > > > > dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode)); > > > > spin_unlock(&mdsc->cap_dirty_lock); > > > > + ceph_wait_on_async_create(inode); > > > > ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL); > > > > iput(inode); > > > > spin_lock(&mdsc->cap_dirty_lock); > > > This looks good. > > > > > > Possibly we can add one dedicated list to store the async creating > > > inodes instead of getting stuck all the others ? > > > > > I'd be open to that. I think we ought to take this patch first to fix > > the immediate bug though, before we add extra complexity. > > Sounds good to me. > > I will merge it to the testing branch for now and let's improve it later. > Can we also mark this for stable? It's a pretty nasty bug, potentially. We should get this into mainline fairly soon. Thanks, -- Jeff Layton <jlayton@xxxxxxxxxx>