I restarted the mds process which was in "up:stopping" state. Since then there are no trimmings behind any more. All (sub)directories are accessible as normal again. It seems there are stability issues with snapshots in a multi-mds cephfs on nautilus. This has already been suspected here: http://docs.ceph.com/docs/nautilus/cephfs/experimental-features/#snapshots Regards, Lars Fri, 3 May 2019 11:45:41 +0200 Lars Täuber <taeuber@xxxxxxx> ==> ceph-users@xxxxxxxxxxxxxx : > Hi, > > I'm still new to ceph. Here are similar problems with CephFS. > > ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable) > on Debian GNU/Linux buster/sid > > # ceph health detail > HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on trimming > MDS_SLOW_REQUEST 1 MDSs report slow requests > mdsmds3(mds.0): 13 slow requests are blocked > 30 secs > MDS_TRIM 1 MDSs behind on trimming > mdsmds3(mds.0): Behind on trimming (33924/125) max_segments: 125, num_segments: 33924 > > > > The workload is "doveadm backup" of more than 500 mail folders from a local ext4 to a cephfs. > * There are ~180'000 files with a strange file size distribution: > > # NumSamples = 181056; MIN_SEEN = 377; MAX_SEEN = 584835624 > # Mean = 4477785.646005; Variance = 31526763457775.421875; SD = 5614869.852256 > 377 - 262502 [ 56652]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 31.29% > 262502 - 524627 [ 4891]: ∎∎∎∎ 2.70% > 524627 - 786752 [ 3498]: ∎∎∎ 1.93% > 786752 - 1048878 [ 2770]: ∎∎∎ 1.53% > 1048878 - 1311003 [ 2460]: ∎∎ 1.36% > 1311003 - 1573128 [ 2197]: ∎∎ 1.21% > 1573128 - 1835253 [ 2014]: ∎∎ 1.11% > 1835253 - 2097378 [ 1961]: ∎∎ 1.08% > 2097378 - 2359503 [ 2244]: ∎∎ 1.24% > 2359503 - 2621628 [ 1890]: ∎∎ 1.04% > 2621628 - 2883754 [ 1897]: ∎∎ 1.05% > 2883754 - 3145879 [ 2188]: ∎∎ 1.21% > 3145879 - 3408004 [ 2579]: ∎∎ 1.42% > 3408004 - 3670129 [ 3396]: ∎∎∎ 1.88% > 3670129 - 3932254 [ 5173]: ∎∎∎∎ 2.86% > 3932254 - 4194379 [ 24847]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 13.72% > 4194379 - 4456505 [ 1512]: ∎∎ 0.84% > 4456505 - 4718630 [ 1394]: ∎∎ 0.77% > 4718630 - 4980755 [ 1412]: ∎∎ 0.78% > 4980755 - 584835624 [ 56081]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 30.97% > > * There are two snapshots of the main directory the mails are backed up to. > * There are three sub directories where a simple ls doesn't return from. > * The cephfs is mounted using the kernel driver of Ubuntu 18.04.2 LTS kernel 4.15.0-48-generic. > * Same behaviour with ceph-fuse 'FUSE library version: 2.9.7' with the difference that I can't interrupt the ls. > > The reduction of the number of mds working for our cephfs to 1 made no difference. > The number of segments is still rising. > # ceph -w > cluster: > id: 6cba13d1-b814-489c-9aac-9c04aaf78720 > health: HEALTH_WARN > 1 MDSs report slow requests > 1 MDSs behind on trimming > > services: > mon: 3 daemons, quorum mon1,mon2,mon3 (age 3d) > mgr: cephsible(active, since 27h), standbys: mon3, mon1 > mds: cephfs_1:2 {0=mds3=up:active,1=mds2=up:stopping} 1 up:standby > osd: 30 osds: 30 up (since 4w), 30 in (since 5w) > > data: > pools: 5 pools, 393 pgs > objects: 607.74k objects, 1.5 TiB > usage: 6.9 TiB used, 160 TiB / 167 TiB avail > pgs: 393 active+clean > > > 2019-05-03 11:40:17.916193 mds.mds3 [WRN] 15 slow requests, 0 included below; oldest blocked for > 342610.193367 secs > > It seems the stopping of one out of two mds doesn't come to an end. > > How to debug this? > > Thanks in advance. > Lars > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com