Re: mds0: Behind on trimming (58621/30)

"Yan, Zheng" <ukernel@xxxxxxxxx> · Tue, 5 Jul 2016 21:14:12 +0800

On Tue, Jul 5, 2016 at 7:56 PM, Kenneth Waegeman
<kenneth.waegeman@xxxxxxxx> wrote:
>
>
> On 04/07/16 11:22, Kenneth Waegeman wrote:
>>
>>
>>
>> On 01/07/16 16:01, Yan, Zheng wrote:
>>>
>>> On Fri, Jul 1, 2016 at 6:59 PM, John Spray <jspray@xxxxxxxxxx> wrote:
>>>>
>>>> On Fri, Jul 1, 2016 at 11:35 AM, Kenneth Waegeman
>>>> <kenneth.waegeman@xxxxxxxx> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> While syncing a lot of files to cephfs, our mds cluster got haywire:
>>>>> the
>>>>> mdss have a lot of segments behind on trimming:  (58621/30)
>>>>> Because of this the mds cluster gets degraded. RAM usage is about 50GB.
>>>>> The
>>>>> mdses were respawning and replaying continiously, and I had to stop all
>>>>> syncs , unmount all clients and increase the beacon_grace to keep the
>>>>> cluster up .
>>>>>
>>>>> [root@mds03 ~]# ceph status
>>>>>      cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
>>>>>       health HEALTH_WARN
>>>>>              mds0: Behind on trimming (58621/30)
>>>>>       monmap e1: 3 mons at
>>>>>
>>>>> {mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
>>>>>              election epoch 170, quorum 0,1,2 mds01,mds02,mds03
>>>>>        fsmap e78658: 1/1/1 up {0=mds03=up:active}, 2 up:standby
>>>>>       osdmap e19966: 156 osds: 156 up, 156 in
>>>>>              flags sortbitwise
>>>>>        pgmap v10213164: 4160 pgs, 4 pools, 253 TB data, 203 Mobjects
>>>>>              357 TB used, 516 TB / 874 TB avail
>>>>>                  4151 active+clean
>>>>>                     5 active+clean+scrubbing
>>>>>                     4 active+clean+scrubbing+deep
>>>>>    client io 0 B/s rd, 0 B/s wr, 63 op/s rd, 844 op/s wr
>>>>>    cache io 68 op/s promote
>>>>>
>>>>>
>>>>> Now it finally is up again, it is trimming very slowly (+-120 segments
>>>>> /
>>>>> min)
>>>>
>>>> Hmm, so it sounds like something was wrong that got cleared by either
>>>> the MDS restart or the client unmount, and now it's trimming at a
>>>> healthier rate.
>>>>
>>>> What client (kernel or fuse, and version)?
>>>>
>>>> Can you confirm that the RADOS cluster itself was handling operations
>>>> reasonably quickly?  Is your metadata pool using the same drives as
>>>> your data?  Were the OSDs saturated with IO?
>>>>
>>>> While the cluster was accumulating untrimmed segments, did you also
>>>> have a "client xyz failing to advanced oldest_tid" warning?
>>>
>>> This does not prevent MDS from trimming log segment.
>>>
>>>> It would be good to clarify whether the MDS was trimming slowly, or
>>>> not at all.  If you can reproduce this situation, get it to a "behind
>>>> on trimming" state, and the stop the client IO (but leave it mounted).
>>>> See if the (x/30) number stays the same.  Then, does it start to
>>>> decrease when you unmount the client?  That would indicate a
>>>> misbehaving client.
>>>
>>> Behind on trimming on single MDS cluster should be caused by either
>>> slow rados operations or MDS trim too few log segments on each tick.
>>>
>>> Kenneth, could you try setting mds_log_max_expiring to a large value
>>> (such as 200)
>>
>> I've set the mds_log_max_expiring to 200 right now. Should I see something
>> instantly?
>
> The trimming finished rather quick, although I don't have any accurate time
> measures. Cluster looks running fine right now, but running incremental
> sync. We will try with same data again to see if it is ok now.
> Is this mds_log_max_expiring option production ready ? (Don't seem to find
> it in documentation)

It should be safe. Setting mds_log_max_expiring to 200 does not change
the code path

Yan, Zheng

>
> Thank you!!
>
> K
>
>>
>> This weekend , the trimming did not contunue and something happened to the
>> cluster:
>>
>> mds.0.cache.dir(1000da74e85) commit error -2 v 2466977
>> log_channel(cluster) log [ERR] : failed to commit dir 1000da74e85 object,
>> errno -2
>> mds.0.78429 unhandled write error (2) No such file or directory, force
>> readonly...
>> mds.0.cache force file system read-only
>> log_channel(cluster) log [WRN] : force file system read-only
>>
>> and ceph health reported:
>> mds0: MDS in read-only mode
>>
>> I restarted it and it is trimming again.
>>
>>
>> Thanks again!
>> Kenneth
>>>
>>> Regards
>>> Yan, Zheng
>>>
>>>> John
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com