Re: mds0: Behind on trimming (58621/30)

Kenneth Waegeman <kenneth.waegeman@xxxxxxxx> · Fri, 1 Jul 2016 15:46:47 +0200

On 01/07/16 12:59, John Spray wrote:
On Fri, Jul 1, 2016 at 11:35 AM, Kenneth Waegeman
<kenneth.waegeman@xxxxxxxx> wrote:
Hi all,

While syncing a lot of files to cephfs, our mds cluster got haywire: the
mdss have a lot of segments behind on trimming:  (58621/30)
Because of this the mds cluster gets degraded. RAM usage is about 50GB. The
mdses were respawning and replaying continiously, and I had to stop all
syncs , unmount all clients and increase the beacon_grace to keep the
cluster up .

[root@mds03 ~]# ceph status
     cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
      health HEALTH_WARN
             mds0: Behind on trimming (58621/30)
      monmap e1: 3 mons at
{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
             election epoch 170, quorum 0,1,2 mds01,mds02,mds03
       fsmap e78658: 1/1/1 up {0=mds03=up:active}, 2 up:standby
      osdmap e19966: 156 osds: 156 up, 156 in
             flags sortbitwise
       pgmap v10213164: 4160 pgs, 4 pools, 253 TB data, 203 Mobjects
             357 TB used, 516 TB / 874 TB avail
                 4151 active+clean
                    5 active+clean+scrubbing
                    4 active+clean+scrubbing+deep
   client io 0 B/s rd, 0 B/s wr, 63 op/s rd, 844 op/s wr
   cache io 68 op/s promote

Now it finally is up again, it is trimming very slowly (+-120 segments /
min)
Hmm, so it sounds like something was wrong that got cleared by either
the MDS restart or the client unmount, and now it's trimming at a
healthier rate.

What client (kernel or fuse, and version)?
kernel client of centos 7.2, 3.10.0-327.18.2.el7

Can you confirm that the RADOS cluster itself was handling operations
reasonably quickly?  Is your metadata pool using the same drives as
your data?  Were the OSDs saturated with IO?
Metadata pool is a pool of SSDS. Data is ecpool with a cache layer of 
seperate ssds. There was indeed load on the OSDS, and the ceph health 
command produced regularly Cache at/near full ratio warnings too

While the cluster was accumulating untrimmed segments, did you also
have a "client xyz failing to advanced oldest_tid" warning?
We did not see that warning.

It would be good to clarify whether the MDS was trimming slowly, or
not at all.  If you can reproduce this situation, get it to a "behind
on trimming" state, and the stop the client IO (but leave it mounted).
See if the (x/30) number stays the same.  Then, does it start to
decrease when you unmount the client?  That would indicate a
misbehaving client.
mds trimming still at (37927/30), so have to wait some more hours before 
i can try to reproduce it. (Nothing can be done to speed this up?)
There was a moment were the mds was active and I didn't saw the segments 
going down.. I did ran ceph daemon mds.mds03 flush journal. But this was 
before i changed the beacon_grace so it respawned again at that moment, 
so I'm not quite sure if there was another issue then.

Thanks again!

Kenneth

John

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com