Hi, I'm still new to ceph. Here are similar problems with CephFS. ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable) on Debian GNU/Linux buster/sid # ceph health detail HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on trimming MDS_SLOW_REQUEST 1 MDSs report slow requests mdsmds3(mds.0): 13 slow requests are blocked > 30 secs MDS_TRIM 1 MDSs behind on trimming mdsmds3(mds.0): Behind on trimming (33924/125) max_segments: 125, num_segments: 33924 The workload is "doveadm backup" of more than 500 mail folders from a local ext4 to a cephfs. * There are ~180'000 files with a strange file size distribution: # NumSamples = 181056; MIN_SEEN = 377; MAX_SEEN = 584835624 # Mean = 4477785.646005; Variance = 31526763457775.421875; SD = 5614869.852256 377 - 262502 [ 56652]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 31.29% 262502 - 524627 [ 4891]: ∎∎∎∎ 2.70% 524627 - 786752 [ 3498]: ∎∎∎ 1.93% 786752 - 1048878 [ 2770]: ∎∎∎ 1.53% 1048878 - 1311003 [ 2460]: ∎∎ 1.36% 1311003 - 1573128 [ 2197]: ∎∎ 1.21% 1573128 - 1835253 [ 2014]: ∎∎ 1.11% 1835253 - 2097378 [ 1961]: ∎∎ 1.08% 2097378 - 2359503 [ 2244]: ∎∎ 1.24% 2359503 - 2621628 [ 1890]: ∎∎ 1.04% 2621628 - 2883754 [ 1897]: ∎∎ 1.05% 2883754 - 3145879 [ 2188]: ∎∎ 1.21% 3145879 - 3408004 [ 2579]: ∎∎ 1.42% 3408004 - 3670129 [ 3396]: ∎∎∎ 1.88% 3670129 - 3932254 [ 5173]: ∎∎∎∎ 2.86% 3932254 - 4194379 [ 24847]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 13.72% 4194379 - 4456505 [ 1512]: ∎∎ 0.84% 4456505 - 4718630 [ 1394]: ∎∎ 0.77% 4718630 - 4980755 [ 1412]: ∎∎ 0.78% 4980755 - 584835624 [ 56081]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 30.97% * There are two snapshots of the main directory the mails are backed up to. * There are three sub directories where a simple ls doesn't return from. * The cephfs is mounted using the kernel driver of Ubuntu 18.04.2 LTS kernel 4.15.0-48-generic. * Same behaviour with ceph-fuse 'FUSE library version: 2.9.7' with the difference that I can't interrupt the ls. The reduction of the number of mds working for our cephfs to 1 made no difference. The number of segments is still rising. # ceph -w cluster: id: 6cba13d1-b814-489c-9aac-9c04aaf78720 health: HEALTH_WARN 1 MDSs report slow requests 1 MDSs behind on trimming services: mon: 3 daemons, quorum mon1,mon2,mon3 (age 3d) mgr: cephsible(active, since 27h), standbys: mon3, mon1 mds: cephfs_1:2 {0=mds3=up:active,1=mds2=up:stopping} 1 up:standby osd: 30 osds: 30 up (since 4w), 30 in (since 5w) data: pools: 5 pools, 393 pgs objects: 607.74k objects, 1.5 TiB usage: 6.9 TiB used, 160 TiB / 167 TiB avail pgs: 393 active+clean 2019-05-03 11:40:17.916193 mds.mds3 [WRN] 15 slow requests, 0 included below; oldest blocked for > 342610.193367 secs It seems the stopping of one out of two mds doesn't come to an end. How to debug this? Thanks in advance. Lars _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com