On Tue, Jul 5, 2016 at 7:56 PM, Kenneth Waegeman <kenneth.waegeman@xxxxxxxx> wrote: > > > On 04/07/16 11:22, Kenneth Waegeman wrote: >> >> >> >> On 01/07/16 16:01, Yan, Zheng wrote: >>> >>> On Fri, Jul 1, 2016 at 6:59 PM, John Spray <jspray@xxxxxxxxxx> wrote: >>>> >>>> On Fri, Jul 1, 2016 at 11:35 AM, Kenneth Waegeman >>>> <kenneth.waegeman@xxxxxxxx> wrote: >>>>> >>>>> Hi all, >>>>> >>>>> While syncing a lot of files to cephfs, our mds cluster got haywire: >>>>> the >>>>> mdss have a lot of segments behind on trimming: (58621/30) >>>>> Because of this the mds cluster gets degraded. RAM usage is about 50GB. >>>>> The >>>>> mdses were respawning and replaying continiously, and I had to stop all >>>>> syncs , unmount all clients and increase the beacon_grace to keep the >>>>> cluster up . >>>>> >>>>> [root@mds03 ~]# ceph status >>>>> cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47 >>>>> health HEALTH_WARN >>>>> mds0: Behind on trimming (58621/30) >>>>> monmap e1: 3 mons at >>>>> >>>>> {mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0} >>>>> election epoch 170, quorum 0,1,2 mds01,mds02,mds03 >>>>> fsmap e78658: 1/1/1 up {0=mds03=up:active}, 2 up:standby >>>>> osdmap e19966: 156 osds: 156 up, 156 in >>>>> flags sortbitwise >>>>> pgmap v10213164: 4160 pgs, 4 pools, 253 TB data, 203 Mobjects >>>>> 357 TB used, 516 TB / 874 TB avail >>>>> 4151 active+clean >>>>> 5 active+clean+scrubbing >>>>> 4 active+clean+scrubbing+deep >>>>> client io 0 B/s rd, 0 B/s wr, 63 op/s rd, 844 op/s wr >>>>> cache io 68 op/s promote >>>>> >>>>> >>>>> Now it finally is up again, it is trimming very slowly (+-120 segments >>>>> / >>>>> min) >>>> >>>> Hmm, so it sounds like something was wrong that got cleared by either >>>> the MDS restart or the client unmount, and now it's trimming at a >>>> healthier rate. >>>> >>>> What client (kernel or fuse, and version)? >>>> >>>> Can you confirm that the RADOS cluster itself was handling operations >>>> reasonably quickly? Is your metadata pool using the same drives as >>>> your data? Were the OSDs saturated with IO? >>>> >>>> While the cluster was accumulating untrimmed segments, did you also >>>> have a "client xyz failing to advanced oldest_tid" warning? >>> >>> This does not prevent MDS from trimming log segment. >>> >>>> It would be good to clarify whether the MDS was trimming slowly, or >>>> not at all. If you can reproduce this situation, get it to a "behind >>>> on trimming" state, and the stop the client IO (but leave it mounted). >>>> See if the (x/30) number stays the same. Then, does it start to >>>> decrease when you unmount the client? That would indicate a >>>> misbehaving client. >>> >>> Behind on trimming on single MDS cluster should be caused by either >>> slow rados operations or MDS trim too few log segments on each tick. >>> >>> Kenneth, could you try setting mds_log_max_expiring to a large value >>> (such as 200) >> >> I've set the mds_log_max_expiring to 200 right now. Should I see something >> instantly? > > The trimming finished rather quick, although I don't have any accurate time > measures. Cluster looks running fine right now, but running incremental > sync. We will try with same data again to see if it is ok now. > Is this mds_log_max_expiring option production ready ? (Don't seem to find > it in documentation) It should be safe. Setting mds_log_max_expiring to 200 does not change the code path Yan, Zheng > > Thank you!! > > K > >> >> This weekend , the trimming did not contunue and something happened to the >> cluster: >> >> mds.0.cache.dir(1000da74e85) commit error -2 v 2466977 >> log_channel(cluster) log [ERR] : failed to commit dir 1000da74e85 object, >> errno -2 >> mds.0.78429 unhandled write error (2) No such file or directory, force >> readonly... >> mds.0.cache force file system read-only >> log_channel(cluster) log [WRN] : force file system read-only >> >> and ceph health reported: >> mds0: MDS in read-only mode >> >> I restarted it and it is trimming again. >> >> >> Thanks again! >> Kenneth >>> >>> Regards >>> Yan, Zheng >>> >>>> John >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com