Re: Ceph Multi Mds Trim Log Slow

Lars Täuber <taeuber@xxxxxxx> · Mon, 6 May 2019 12:43:08 +0200

I restarted the mds process which was in "up:stopping" state.
Since then there are no trimmings behind any more.
All (sub)directories are accessible as normal again.

It seems there are stability issues with snapshots in a multi-mds cephfs on nautilus.
This has already been suspected here:
http://docs.ceph.com/docs/nautilus/cephfs/experimental-features/#snapshots

Regards,
Lars

Fri, 3 May 2019 11:45:41 +0200
Lars Täuber <taeuber@xxxxxxx> ==> ceph-users@xxxxxxxxxxxxxx :
> Hi,
> 
> I'm still new to ceph. Here are similar problems with CephFS.
> 
> ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
> on Debian GNU/Linux buster/sid
> 
> # ceph health detail
> HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on trimming
> MDS_SLOW_REQUEST 1 MDSs report slow requests
>     mdsmds3(mds.0): 13 slow requests are blocked > 30 secs
> MDS_TRIM 1 MDSs behind on trimming
>     mdsmds3(mds.0): Behind on trimming (33924/125) max_segments: 125, num_segments: 33924
> 
> 
> 
> The workload is "doveadm backup" of more than 500 mail folders from a local ext4 to a cephfs.
> * There are ~180'000 files with a strange file size distribution:
> 
> # NumSamples = 181056; MIN_SEEN = 377; MAX_SEEN = 584835624
> # Mean = 4477785.646005; Variance = 31526763457775.421875; SD = 5614869.852256
>         377 -     262502 [ 56652]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 31.29%
>      262502 -     524627 [  4891]: ∎∎∎∎  2.70%
>      524627 -     786752 [  3498]: ∎∎∎  1.93%
>      786752 -    1048878 [  2770]: ∎∎∎  1.53%
>     1048878 -    1311003 [  2460]: ∎∎  1.36%
>     1311003 -    1573128 [  2197]: ∎∎  1.21%
>     1573128 -    1835253 [  2014]: ∎∎  1.11%
>     1835253 -    2097378 [  1961]: ∎∎  1.08%
>     2097378 -    2359503 [  2244]: ∎∎  1.24%
>     2359503 -    2621628 [  1890]: ∎∎  1.04%
>     2621628 -    2883754 [  1897]: ∎∎  1.05%
>     2883754 -    3145879 [  2188]: ∎∎  1.21%
>     3145879 -    3408004 [  2579]: ∎∎  1.42%
>     3408004 -    3670129 [  3396]: ∎∎∎  1.88%
>     3670129 -    3932254 [  5173]: ∎∎∎∎  2.86%
>     3932254 -    4194379 [ 24847]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 13.72%
>     4194379 -    4456505 [  1512]: ∎∎  0.84%
>     4456505 -    4718630 [  1394]: ∎∎  0.77%
>     4718630 -    4980755 [  1412]: ∎∎  0.78%
>     4980755 -  584835624 [ 56081]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 30.97%
> 
> * There are two snapshots of the main directory the mails are backed up to.
> * There are three sub directories where a simple ls doesn't return from.
> * The cephfs is mounted using the kernel driver of Ubuntu 18.04.2 LTS kernel 4.15.0-48-generic.
> * Same behaviour with ceph-fuse 'FUSE library version: 2.9.7' with the difference that I can't interrupt the ls.
> 
> The reduction of the number of mds working for our cephfs to 1 made no difference.
> The number of segments is still rising.
> # ceph -w
>   cluster:
>     id:     6cba13d1-b814-489c-9aac-9c04aaf78720
>     health: HEALTH_WARN
>             1 MDSs report slow requests
>             1 MDSs behind on trimming
>  
>   services:
>     mon: 3 daemons, quorum mon1,mon2,mon3 (age 3d)
>     mgr: cephsible(active, since 27h), standbys: mon3, mon1
>     mds: cephfs_1:2 {0=mds3=up:active,1=mds2=up:stopping} 1 up:standby
>     osd: 30 osds: 30 up (since 4w), 30 in (since 5w)
>  
>   data:
>     pools:   5 pools, 393 pgs
>     objects: 607.74k objects, 1.5 TiB
>     usage:   6.9 TiB used, 160 TiB / 167 TiB avail
>     pgs:     393 active+clean
>  
> 
> 2019-05-03 11:40:17.916193 mds.mds3 [WRN] 15 slow requests, 0 included below; oldest blocked for > 342610.193367 secs
> 
> It seems the stopping of one out of two mds doesn't come to an end.
> 
> How to debug this?
> 
> Thanks in advance.
> Lars
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com