mds0: Behind on trimming (58621/30)

Kenneth Waegeman <kenneth.waegeman@xxxxxxxx> · Fri, 1 Jul 2016 12:35:05 +0200

Hi all,

While syncing a lot of files to cephfs, our mds cluster got haywire: the 
mdss have a lot of segments behind on trimming:  (58621/30)
Because of this the mds cluster gets degraded. RAM usage is about 50GB. 
The mdses were respawning and replaying continiously, and I had to stop 
all syncs , unmount all clients and increase the beacon_grace to keep 
the cluster up .

[root@mds03 ~]# ceph status
    cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
     health HEALTH_WARN
            mds0: Behind on trimming (58621/30)
     monmap e1: 3 mons at 
{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
            election epoch 170, quorum 0,1,2 mds01,mds02,mds03
      fsmap e78658: 1/1/1 up {0=mds03=up:active}, 2 up:standby
     osdmap e19966: 156 osds: 156 up, 156 in
            flags sortbitwise
      pgmap v10213164: 4160 pgs, 4 pools, 253 TB data, 203 Mobjects
            357 TB used, 516 TB / 874 TB avail
                4151 active+clean
                   5 active+clean+scrubbing
                   4 active+clean+scrubbing+deep
  client io 0 B/s rd, 0 B/s wr, 63 op/s rd, 844 op/s wr
  cache io 68 op/s promote

Now it finally is up again, it is trimming very slowly (+-120 segments / 
min)
We've seen some 'behind on trimming' before, but never that much..
So now our production cluster is unusable for approx half a day..

What could be the problem here? We are running 10.2.1
Can something be done to not let the mds keep that much segments ?
Can we fasten the trimming process?

Thanks you very much!

Cheers,
Kenneth
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com