Re: [PATCH RFC] ceph: flush the delayed caps in time

Jeff Layton <jlayton@xxxxxxxxxx> · Thu, 22 Jul 2021 07:42:00 -0400



On Thu, 2021-07-22 at 17:36 +0800, Xiubo Li wrote:
> On 7/21/21 8:57 PM, Jeff Layton wrote:
> > On Wed, 2021-07-21 at 19:54 +0800, Xiubo Li wrote:
> > > On 7/21/21 7:23 PM, Jeff Layton wrote:
> > > > On Wed, 2021-07-21 at 16:27 +0800, xiubli@xxxxxxxxxx wrote:
> > > > > From: Xiubo Li <xiubli@xxxxxxxxxx>
> > > > > 
> > > > > The delayed_work will be executed per 5 seconds, during this time
> > > > > the cap_delay_list may accumulate thounsands of caps need to flush,
> > > > > this will make the MDS's dispatch queue be full and need a very long
> > > > > time to handle them. And if there has some other operations, likes
> > > > > a rmdir request, it will be add in the tail of dispath queue and
> > > > > need to wait for several or tens of seconds.
> > > > > 
> > > > > In client side we shouldn't queue to many of the cap requests and
> > > > > flush them if there has more than 100 items.
> > > > > 
> > > > Why 100? My inclination is to say NAK on this.
> > > This just from my test that around 100 client_caps requests queued will
> > > work fine in most cases, which won't take too long to handle. Some times
> > > the client will send thousands of requests in a short time, that will be
> > > a problem.
> > What may be a better approach is to figure out why we're holding on to
> > so many caps and trying to flush them all at once. Maybe if we were to
> > more aggressively flush sooner, we'd not end up with such a backlog?
> 
>   881912 Jul 22 10:36:14 lxbceph1 kernel: ceph:  00000000f7ee4ccf mode 
> 040755 uid.gid 0.0
>   881913 Jul 22 10:36:14 lxbceph1 kernel: ceph:  size 0 -> 0
>   881914 Jul 22 10:36:14 lxbceph1 kernel: ceph:  truncate_seq 0 -> 1
>   881915 Jul 22 10:36:14 lxbceph1 kernel: ceph:  truncate_size 0 -> 
> 18446744073709551615
>   881916 Jul 22 10:36:14 lxbceph1 kernel: ceph:  add_cap 
> 00000000f7ee4ccf mds0 cap 152d pAsxLsXsxFsx seq 1
>   881917 Jul 22 10:36:14 lxbceph1 kernel: ceph:  lookup_snap_realm 1 
> 0000000095ff27ff
>   881918 Jul 22 10:36:14 lxbceph1 kernel: ceph:  get_realm 
> 0000000095ff27ff 5421 -> 5422
>   881919 Jul 22 10:36:14 lxbceph1 kernel: ceph:  __ceph_caps_issued 
> 00000000f7ee4ccf cap 000000003c8bc134 issued -
>   881920 Jul 22 10:36:14 lxbceph1 kernel: ceph:   marking 
> 00000000f7ee4ccf NOT complete
>   881921 Jul 22 10:36:14 lxbceph1 kernel: ceph:   issued pAsxLsXsxFsx, 
> mds wanted -, actual -, queueing
>   881922 Jul 22 10:36:14 lxbceph1 kernel: ceph:  __cap_set_timeouts 
> 00000000f7ee4ccf min 5036 max 60036
>   881923 Jul 22 10:36:14 lxbceph1 kernel: ceph: __cap_delay_requeue 
> 00000000f7ee4ccf flags 0 at 4294896928
>   881924 Jul 22 10:36:14 lxbceph1 kernel: ceph:  add_cap inode 
> 00000000f7ee4ccf (1000000152a.fffffffffffffffe) cap 000000003c8bc134 
> pAsxLsXsxFsx now pAsxLsXsxFsx seq 1 mds0
>   881925 Jul 22 10:36:14 lxbceph1 kernel: ceph:   marking 
> 00000000f7ee4ccf complete (empty)
>   881926 Jul 22 10:36:14 lxbceph1 kernel: ceph:  dn 0000000079fd7e04 
> attached to 00000000f7ee4ccf ino 1000000152a.fffffffffffffffe
>   881927 Jul 22 10:36:14 lxbceph1 kernel: ceph: update_dentry_lease 
> 0000000079fd7e04 duration 30000 ms ttl 4294866888
>   881928 Jul 22 10:36:14 lxbceph1 kernel: ceph:  dentry_lru_touch 
> 000000006fddf4b0 0000000079fd7e04 'removalC.test01806' (offset 0)
> 
> 
>  From the kernel logs, some of these delayed caps are from previous 
> thousands of 'mkdir' requests, after the mkdir requests get replied and 
> when creating new caps, since the MDS has issued extra caps that they 
> don't want, then these caps are added to the delay queue, and in 5 
> seconds the delay list could accumulate up to several thousands of caps.
> 
> And most of them come from stale dentries.
> 
> So 5 seconds later when the delayed_queue is fired, all this client_caps 
> requests are queued in the MDS dispatch queue.
> 
> The following commit could improve the performance very much:
> 
> commit 37c4efc1ddf98ba8b234d116d863a9464445901e
> Author: Yan, Zheng <zyan@xxxxxxxxxx>
> Date:   Thu Jan 31 16:55:51 2019 +0800
> 
>      ceph: periodically trim stale dentries
> 
>      Previous commit make VFS delete stale dentry when last reference is
>      dropped. Lease also can become invalid when corresponding dentry has
>      no reference. This patch make cephfs periodically scan lease list,
>      delete corresponding dentry if lease is invalid.
> 
>      There are two types of lease, dentry lease and dir lease. dentry lease
>      has life time and applies to singe dentry. Dentry lease is added to 
> tail
>      of a list when it's updated, leases at front of the list will expire
>      first. Dir lease is CEPH_CAP_FILE_SHARED on directory inode, it applies
>      to all dentries in the directory. Dentries have dir leases are added to
>      another list. Dentries in the list are periodically checked in a round
>      robin manner.
> 
> With this commit, it will take 3~4 minutes to finish my test:
> 
> real    3m33.998s
> user    0m0.644s
> sys    0m2.341s
> [1]   Done                    ( cd /mnt/kcephfs.$i && time strace -Tv -o 
> ~/removal${i}.log -- rm -rf removal$i* )
> real    3m34.028s
> user    0m0.620s
> sys    0m2.342s
> [2]-  Done                    ( cd /mnt/kcephfs.$i && time strace -Tv -o 
> ~/removal${i}.log -- rm -rf removal$i* )
> real    3m34.049s
> user    0m0.638s
> sys    0m2.315s
> [3]+  Done                    ( cd /mnt/kcephfs.$i && time strace -Tv -o 
> ~/removal${i}.log -- rm -rf removal$i* )
> 
> Without this it will take more than 12 minutes for above 3 'rm' threads.
> 

That commit was already merged a couple of years ago. Does this mean you
think the client behavior can't be improved further?
-- 
Jeff Layton <jlayton@xxxxxxxxxx>