> From: ukernel@xxxxxxxxx > Date: Tue, 5 Jul 2016 21:14:12 +0800 > To: kenneth.waegeman@xxxxxxxx > CC: ceph-users@xxxxxxxxxxxxxx > Subject: Re: [ceph-users] mds0: Behind on trimming (58621/30) > > On Tue, Jul 5, 2016 at 7:56 PM, Kenneth Waegeman > <kenneth.waegeman@xxxxxxxx> wrote: > > > > > > On 04/07/16 11:22, Kenneth Waegeman wrote: > >> > >> > >> > >> On 01/07/16 16:01, Yan, Zheng wrote: > >>> > >>> On Fri, Jul 1, 2016 at 6:59 PM, John Spray <jspray@xxxxxxxxxx> wrote: > >>>> > >>>> On Fri, Jul 1, 2016 at 11:35 AM, Kenneth Waegeman > >>>> <kenneth.waegeman@xxxxxxxx> wrote: > >>>>> > >>>>> Hi all, > >>>>> > >>>>> While syncing a lot of files to cephfs, our mds cluster got haywire: > >>>>> the > >>>>> mdss have a lot of segments behind on trimming: (58621/30) > >>>>> Because of this the mds cluster gets degraded. RAM usage is about 50GB. > >>>>> The > >>>>> mdses were respawning and replaying continiously, and I had to stop all > >>>>> syncs , unmount all clients and increase the beacon_grace to keep the > >>>>> cluster up . > >>>>> > >>>>> [root@mds03 ~]# ceph status > >>>>> cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47 > >>>>> health HEALTH_WARN > >>>>> mds0: Behind on trimming (58621/30) > >>>>> monmap e1: 3 mons at > >>>>> > >>>>> {mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0} > >>>>> election epoch 170, quorum 0,1,2 mds01,mds02,mds03 > >>>>> fsmap e78658: 1/1/1 up {0=mds03=up:active}, 2 up:standby > >>>>> osdmap e19966: 156 osds: 156 up, 156 in > >>>>> flags sortbitwise > >>>>> pgmap v10213164: 4160 pgs, 4 pools, 253 TB data, 203 Mobjects > >>>>> 357 TB used, 516 TB / 874 TB avail > >>>>> 4151 active+clean > >>>>> 5 active+clean+scrubbing > >>>>> 4 active+clean+scrubbing+deep > >>>>> client io 0 B/s rd, 0 B/s wr, 63 op/s rd, 844 op/s wr > >>>>> cache io 68 op/s promote > >>>>> > >>>>> > >>>>> Now it finally is up again, it is trimming very slowly (+-120 segments > >>>>> / > >>>>> min) > >>>> > >>>> Hmm, so it sounds like something was wrong that got cleared by either > >>>> the MDS restart or the client unmount, and now it's trimming at a > >>>> healthier rate. > >>>> > >>>> What client (kernel or fuse, and version)? > >>>> > >>>> Can you confirm that the RADOS cluster itself was handling operations > >>>> reasonably quickly? Is your metadata pool using the same drives as > >>>> your data? Were the OSDs saturated with IO? > >>>> > >>>> While the cluster was accumulating untrimmed segments, did you also > >>>> have a "client xyz failing to advanced oldest_tid" warning? > >>> > >>> This does not prevent MDS from trimming log segment. > >>> > >>>> It would be good to clarify whether the MDS was trimming slowly, or > >>>> not at all. If you can reproduce this situation, get it to a "behind > >>>> on trimming" state, and the stop the client IO (but leave it mounted). > >>>> See if the (x/30) number stays the same. Then, does it start to > >>>> decrease when you unmount the client? That would indicate a > >>>> misbehaving client. > >>> > >>> Behind on trimming on single MDS cluster should be caused by either > >>> slow rados operations or MDS trim too few log segments on each tick. > >>> > >>> Kenneth, could you try setting mds_log_max_expiring to a large value > >>> (such as 200) > >> > >> I've set the mds_log_max_expiring to 200 right now. Should I see something > >> instantly? > > > > The trimming finished rather quick, although I don't have any accurate time > > measures. Cluster looks running fine right now, but running incremental > > sync. We will try with same data again to see if it is ok now. > > Is this mds_log_max_expiring option production ready ? (Don't seem to find > > it in documentation) > > It should be safe. Setting mds_log_max_expiring to 200 does not change > the code path > > Yan, Zheng > > > Zheng, Bump up this conf from 20 -> 200 seems increase the load(concurrent) of flushing? would you prefer make this default? Xiaoxi > > Thank you!! > > > > K > > > >> > >> This weekend , the trimming did not contunue and something happened to the > >> cluster: > >> > >> mds.0.cache.dir(1000da74e85) commit error -2 v 2466977 > >> log_channel(cluster) log [ERR] : failed to commit dir 1000da74e85 object, > >> errno -2 > >> mds.0.78429 unhandled write error (2) No such file or directory, force > >> readonly... > >> mds.0.cache force file system read-only > >> log_channel(cluster) log [WRN] : force file system read-only > >> > >> and ceph health reported: > >> mds0: MDS in read-only mode > >> > >> I restarted it and it is trimming again. > >> > >> > >> Thanks again! > >> Kenneth > >>> > >>> Regards > >>> Yan, Zheng > >>> > >>>> John > >>>> _______________________________________________ > >>>> ceph-users mailing list > >>>> ceph-users@xxxxxxxxxxxxxx > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com