Hi all,
While syncing a lot of files to cephfs, our mds cluster got haywire: the
mdss have a lot of segments behind on trimming: (58621/30)
Because of this the mds cluster gets degraded. RAM usage is about 50GB.
The mdses were respawning and replaying continiously, and I had to stop
all syncs , unmount all clients and increase the beacon_grace to keep
the cluster up .
[root@mds03 ~]# ceph status
cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
health HEALTH_WARN
mds0: Behind on trimming (58621/30)
monmap e1: 3 mons at
{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
election epoch 170, quorum 0,1,2 mds01,mds02,mds03
fsmap e78658: 1/1/1 up {0=mds03=up:active}, 2 up:standby
osdmap e19966: 156 osds: 156 up, 156 in
flags sortbitwise
pgmap v10213164: 4160 pgs, 4 pools, 253 TB data, 203 Mobjects
357 TB used, 516 TB / 874 TB avail
4151 active+clean
5 active+clean+scrubbing
4 active+clean+scrubbing+deep
client io 0 B/s rd, 0 B/s wr, 63 op/s rd, 844 op/s wr
cache io 68 op/s promote
Now it finally is up again, it is trimming very slowly (+-120 segments /
min)
We've seen some 'behind on trimming' before, but never that much..
So now our production cluster is unusable for approx half a day..
What could be the problem here? We are running 10.2.1
Can something be done to not let the mds keep that much segments ?
Can we fasten the trimming process?
Thanks you very much!
Cheers,
Kenneth
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com