Re: MDS behind on trimming every 4-5 weeks causing issue for ceph filesystem

Kotresh Hiremath Ravishankar <khiremat@xxxxxxxxxx> · Mon, 20 May 2024 12:25:02 +0530

Please share the mds per dump as requested. We need to understand what's
happening before suggesting anything.

Thanks & Regards,
Kotresh H R

On Fri, May 17, 2024 at 5:35 PM Akash Warkhade <a.warkhade98@xxxxxxxxx>
wrote:

> @Kotresh Hiremath Ravishankar <khiremat@xxxxxxxxxx>
>
> Can you please help on above
>
>
>
> On Fri, 17 May, 2024, 12:26 pm Akash Warkhade, <a.warkhade98@xxxxxxxxx>
> wrote:
>
>> Hi Kotresh,
>>
>>
>> Thanks for the reply.
>> 1)There are no customer configs defined
>> 2) not enabled subtree pinning
>> 3) there were no warning related to rados
>>
>> So wanted to know In order to fix this should we increase default
>> mds_cache_memory_limit from 4Gb to 6Gb or more?
>> Or is there any other solution for this issue?
>>
>
>> On Fri, 17 May, 2024, 12:11 pm Kotresh Hiremath Ravishankar, <
>> khiremat@xxxxxxxxxx> wrote:
>>
>>> Hi,
>>>
>>> ~6K log segments to be trimmed, that's huge.
>>>
>>> 1. Are there any custom configs configured on this setup ?
>>> 2. Is subtree pinning enabled ?
>>> 3. Are there any warnings w.r.t rados slowness ?
>>> 4. Please share the mds perf dump to check for latencies and other stuff.
>>>    $ceph tell mds.<id> perf dump
>>>
>>> Thanks and Regards,
>>> Kotresh H R
>>>
>>> On Fri, May 17, 2024 at 11:01 AM Akash Warkhade <a.warkhade98@xxxxxxxxx>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> We are using rook-ceph with operator 1.10.8 and ceph 17.2.5.
>>>> we are using ceph filesystem with 4 mds i.e 2 active & 2 standby MDS
>>>> every 3-4 weeks filesystem is having issue i.e in ceph status we can see
>>>> below warnings warnings :
>>>>
>>>> 2 MDS reports slow requests
>>>> 2 MDS Behind on Trimming
>>>> mds.myfs-a(mds.1) : behind on trimming (6378/128) max_segments:128,
>>>> num_segments: 6378
>>>> mds.myfs-c(mds.1):  behind on trimming (6560/128) max_segments:128,
>>>> num_segments: 6560
>>>>
>>>> to fix it, we have to restart all MDS pods one by one.
>>>> this is happening every 4-5 weeks.
>>>>
>>>> We have seen many ceph issues related to it on ceph tracker and many
>>>> people
>>>> are suggesting to increase mds_cache_memory_limit
>>>> currently for our cluster *mds_cache_memory_limit* is set to default 4GB
>>>> *mds_log_max_segments* is set to default 128
>>>> Should we increase *mds_cache_memory_limit* to 8GB from default 4GB or
>>>> is
>>>> there any solution to fix this issue permanently?
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>
>>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx