On Wed, 2022-02-02 at 03:36 +0800, Yan, Zheng wrote: > > > Jeff Layton <jlayton@xxxxxxxxxx>于2022年1月29日 周六18:36写道: > > On Sat, 2022-01-29 at 11:14 +0800, Yan, Zheng wrote: > > > On Sat, Jan 29, 2022 at 2:32 AM Jeff Layton <jlayton@xxxxxxxxxx> > > > wrote: > > > > > > > > I've noticed an intermittent hang waiting for caps in some > > > > testing. What > > > > I see is that the client will try to get caps for an operation > > > > (e.g. a > > > > read), and ends up waiting on the waitqueue forever. The caps > > > > debugfs > > > > file however shows that the caps it's waiting on have already > > > > been > > > > granted. > > > > > > > > The current grant handling code will wake the waitqueue when it > > > > sees > > > > that there are newly-granted caps in the issued set. On an > > > > import > > > > however, we'll end up adding a new cap first, which fools the > > > > logic into > > > > thinking that nothing has changed. A later hack in the code > > > > works around > > > > this, but only for auth caps. > > > > > > not right. handle_cap_import() saves old issued to extra_info- > > > >issued. > > > > > > > It does save the old issued value to extra_info->issued, but > > handle_cap > > grant consults cap->issued for most of its logic. It does check > > against > > extra_info->issued for auth caps to determine whether to wake > > waiters, > > but doesn't do that for non-auth caps. This patch corrects that. > > > > > non auth mds never imports caps. ‘ci->i_auth_cap != cap’ happened only > when client receives out-of-order messages for successive cap > import/export. It’s unlikely in your case because you saw caps were > there. > Ok! I'll drop this patch, and keep hunting for how we're not getting awoken. Let me know if you see any gaps in how "wake" gets set. Thanks, Jeff > > > > > If you think this patch isn't right, can you elaborate on why and > > what > > would make it correct? > > > > Thanks, > > Jeff > > > > > > > > > > Ensure we wake the waiters whenever we get an IMPORT that grants > > > > new > > > > caps for the inode. > > > > > > > > URL: https://tracker.ceph.com/issues/54044 > > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > > > --- > > > > fs/ceph/caps.c | 23 ++++++++++++----------- > > > > 1 file changed, 12 insertions(+), 11 deletions(-) > > > > > > > > I'm still testing this patch, but I think this may be the cause > > > > of some > > > > mysterious hangs I've hit in testing. > > > > > > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > > > > index e668cdb9c99e..06b65a68e920 100644 > > > > --- a/fs/ceph/caps.c > > > > +++ b/fs/ceph/caps.c > > > > @@ -3541,21 +3541,22 @@ static void handle_cap_grant(struct > > > > inode *inode, > > > > fill_inline = true; > > > > } > > > > > > > > - if (ci->i_auth_cap == cap && > > > > - le32_to_cpu(grant->op) == CEPH_CAP_OP_IMPORT) { > > > > + if (le32_to_cpu(grant->op) == CEPH_CAP_OP_IMPORT) { > > > > if (newcaps & ~extra_info->issued) > > > > wake = true; > > > > > > > > - if (ci->i_requested_max_size > max_size || > > > > - !(le32_to_cpu(grant->wanted) & > > > > CEPH_CAP_ANY_FILE_WR)) { > > > > - /* re-request max_size if necessary */ > > > > - ci->i_requested_max_size = 0; > > > > - wake = true; > > > > - } > > > > + if (ci->i_auth_cap == cap) { > > > > + if (ci->i_requested_max_size > max_size > > > > || > > > > + !(le32_to_cpu(grant->wanted) & > > > > CEPH_CAP_ANY_FILE_WR)) { > > > > + /* re-request max_size if > > > > necessary */ > > > > + ci->i_requested_max_size = 0; > > > > + wake = true; > > > > + } > > > > > > > > - ceph_kick_flushing_inode_caps(session, ci); > > > > - spin_unlock(&ci->i_ceph_lock); > > > > - up_read(&session->s_mdsc->snap_rwsem); > > > > + ceph_kick_flushing_inode_caps(session, > > > > ci); > > > > + spin_unlock(&ci->i_ceph_lock); > > > > + up_read(&session->s_mdsc->snap_rwsem); > > > > + } > > > > } else { > > > > spin_unlock(&ci->i_ceph_lock); > > > > } > > > > -- > > > > 2.34.1 > > > > > > -- Jeff Layton <jlayton@xxxxxxxxxx>