On Tue, Mar 31, 2020 at 6:49 AM Yan, Zheng <ukernel@xxxxxxxxx> wrote: > > On Tue, Mar 31, 2020 at 6:52 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > Jan noticed some long stalls when flushing caps using sync() after > > doing small file creates. For instance, running this: > > > > $ time for i in $(seq -w 11 30); do echo "Hello World" > hello-$i.txt; sync -f ./hello-$i.txt; done > > > > Could take more than 90s in some cases. The sync() will flush out caps, > > but doesn't tell the MDS that it's waiting synchronously on the > > replies. > > > > When ceph_check_caps finds that CHECK_CAPS_FLUSH is set, then set the > > CEPH_CLIENT_CAPS_SYNC bit in the cap update request. This clues the MDS > > into that fact and it can then expedite the reply. > > > > URL: https://tracker.ceph.com/issues/44744 > > Reported-and-Tested-by: Jan Fajerski <jfajerski@xxxxxxxx> > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > --- > > fs/ceph/caps.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > > index 61808793e0c0..6403178f2376 100644 > > --- a/fs/ceph/caps.c > > +++ b/fs/ceph/caps.c > > @@ -2111,8 +2111,11 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags, > > > > mds = cap->mds; /* remember mds, so we don't repeat */ > > > > - __prep_cap(&arg, cap, CEPH_CAP_OP_UPDATE, 0, cap_used, want, > > - retain, flushing, flush_tid, oldest_flush_tid); > > + __prep_cap(&arg, cap, CEPH_CAP_OP_UPDATE, > > + (flags & CHECK_CAPS_FLUSH) ? > > + CEPH_CLIENT_CAPS_SYNC : 0, > > + cap_used, want, retain, flushing, flush_tid, > > + oldest_flush_tid); > > spin_unlock(&ci->i_ceph_lock); > > > > this is too expensive for syncfs case. mds needs to flush journal for > each dirty inode. we'd better to track dirty inodes by session, and > only set the flag when flushing the last inode in session dirty list. Yeah, see the userspace Client::_sync_fs() where we have an internal flags argument which is set on the last cap in the dirty set and tells the actual cap message flushing code to set FLAG_SYNC on the MClientCaps message. I presume the kernel is operating on a similar principle here? -Greg > > > > __send_cap(mdsc, &arg, ci); > > -- > > 2.25.1 > > >