Re: CephFS constant high write I/O to the metadata pool

Olli Rajala <olli.rajala@xxxxxxxx> · Sat, 5 Nov 2022 12:47:21 +0200

Oh Lordy,

Seems like I finally got this resolved. And all it needed in the end was to
drop the mds caches with:
ceph tell mds.`hostname` cache drop

The funny thing is that whatever the issue with the cache was it had
persisted through several Ceph upgrades and node reboots. It's been a live
production system so I guess that there just never has been a moment where
all mds would have been down and thus make it fully rebuild the cache...
maybe :|

Unfortunately I don't remember when this issue arose and my metrics don't
reach far back enough... but I wonder if this could have started already
when I did Octopus->Pacific upgrade...

Cheers,
---------------------------
Olli Rajala - Lead TD
Anima Vitae Ltd.
www.anima.fi
---------------------------

On Mon, Oct 24, 2022 at 9:36 PM Olli Rajala <olli.rajala@xxxxxxxx> wrote:

> I tried my luck and upgraded to 17.2.4 but unfortunately that didn't make
> any difference here either.
>
> I also looked more again at all kinds of client op and request stats and
> wotnot which only made me even more certain that this io is not caused by
> any clients.
>
> What internal mds operation or mechanism could cause such high idle write
> io? I've tried to fiddle a bit with some of the mds cache trim and memory
> settings but I haven't noticed any effect there. Any pointers appreciated.
>
> Cheers,
> ---------------------------
> Olli Rajala - Lead TD
> Anima Vitae Ltd.
> www.anima.fi
> ---------------------------
>
>
> On Mon, Oct 17, 2022 at 10:28 AM Olli Rajala <olli.rajala@xxxxxxxx> wrote:
>
>> Hi Patrick,
>>
>> With "objecter_ops" did you mean "ceph tell mds.pve-core-1 ops" and/or
>> "ceph tell mds.pve-core-1 objecter_requests"? Both these show very few
>> requests/ops - many times just returning empty lists. I'm pretty sure that
>> this I/O isn't generated by any clients - I've earlier tried to isolate
>> this by shutting down all cephfs clients and this didn't have any
>> noticeable effect.
>>
>> I tried to watch what is going on with that "perf dump" but to be honest
>> all I can see is some numbers going up in the different sections :)
>> ...don't have a clue what to focus on and how to interpret that.
>>
>> Here's a perf dump if you or anyone could make something out of that:
>> https://gist.github.com/olliRJL/43c10173aafd82be22c080a9cd28e673
>>
>> Tnx!
>> o.
>>
>> ---------------------------
>> Olli Rajala - Lead TD
>> Anima Vitae Ltd.
>> www.anima.fi
>> ---------------------------
>>
>>
>> On Fri, Oct 14, 2022 at 8:32 PM Patrick Donnelly <pdonnell@xxxxxxxxxx>
>> wrote:
>>
>>> Hello Olli,
>>>
>>> On Thu, Oct 13, 2022 at 5:01 AM Olli Rajala <olli.rajala@xxxxxxxx>
>>> wrote:
>>> >
>>> > Hi,
>>> >
>>> > I'm seeing constant 25-50MB/s writes to the metadata pool even when all
>>> > clients and the cluster is idling and in clean state. This surely
>>> can't be
>>> > normal?
>>> >
>>> > There's no apparent issues with the performance of the cluster but this
>>> > write rate seems excessive and I don't know where to look for the
>>> culprit.
>>> >
>>> > The setup is Ceph 16.2.9 running in hyperconverged 3 node core cluster
>>> and
>>> > 6 hdd osd nodes.
>>> >
>>> > Here's typical status when pretty much all clients are idling. Most of
>>> that
>>> > write bandwidth and maybe fifth of the write iops is hitting the
>>> > metadata pool.
>>> >
>>> >
>>> ---------------------------------------------------------------------------------------------------
>>> > root@pve-core-1:~# ceph -s
>>> >   cluster:
>>> >     id:     2088b4b1-8de1-44d4-956e-aa3d3afff77f
>>> >     health: HEALTH_OK
>>> >
>>> >   services:
>>> >     mon: 3 daemons, quorum pve-core-1,pve-core-2,pve-core-3 (age 2w)
>>> >     mgr: pve-core-1(active, since 4w), standbys: pve-core-2, pve-core-3
>>> >     mds: 1/1 daemons up, 2 standby
>>> >     osd: 48 osds: 48 up (since 5h), 48 in (since 4M)
>>> >
>>> >   data:
>>> >     volumes: 1/1 healthy
>>> >     pools:   10 pools, 625 pgs
>>> >     objects: 70.06M objects, 46 TiB
>>> >     usage:   95 TiB used, 182 TiB / 278 TiB avail
>>> >     pgs:     625 active+clean
>>> >
>>> >   io:
>>> >     client:   45 KiB/s rd, 38 MiB/s wr, 6 op/s rd, 287 op/s wr
>>> >
>>> ---------------------------------------------------------------------------------------------------
>>> >
>>> > Here's some daemonperf dump:
>>> >
>>> >
>>> ---------------------------------------------------------------------------------------------------
>>> > root@pve-core-1:~# ceph daemonperf mds.`hostname -s`
>>> >
>>> ----------------------------------------mds-----------------------------------------
>>> > --mds_cache--- ------mds_log------ -mds_mem- -------mds_server-------
>>> mds_
>>> > -----objecter------ purg
>>> > req  rlat fwd  inos caps exi  imi  hifc crev cgra ctru cfsa cfa  hcc
>>> hccd
>>> > hccr prcr|stry recy recd|subm evts segs repl|ino  dn  |hcr  hcs  hsr
>>> cre
>>> >  cat |sess|actv rd   wr   rdwr|purg|
>>> >  40    0    0  767k  78k   0    0    0    1    6    1    0    0    5
>>>   5
>>> >  3    7 |1.1k   0    0 | 17  3.7k 134    0 |767k 767k| 40    5    0
>>> 0
>>> >  0 |110 |  4    2   21    0 |  2
>>> >  57    2    0  767k  78k   0    0    0    3   16    3    0    0   11
>>>  11
>>> >  0   17 |1.1k   0    0 | 45  3.7k 137    0 |767k 767k| 57    8    0
>>> 0
>>> >  0 |110 |  0    2   28    0 |  4
>>> >  57    4    0  767k  78k   0    0    0    4   34    4    0    0   34
>>>  33
>>> >  2   26 |1.0k   0    0 |134  3.9k 139    0 |767k 767k| 57   13    0
>>> 0
>>> >  0 |110 |  0    2  112    0 | 19
>>> >  67    3    0  767k  78k   0    0    0    6   32    6    0    0   22
>>>  22
>>> >  0   32 |1.1k   0    0 | 78  3.9k 141    0 |767k 768k| 67    4    0
>>> 0
>>> >  0 |110 |  0    2   56    0 |  2
>>> >
>>> ---------------------------------------------------------------------------------------------------
>>> > Any ideas where to look at?
>>>
>>> Check the perf dump output of the mds:
>>>
>>> ceph tell mds.<fs_name>:0 perf dump
>>>
>>> over a period of time to identify what's going on. You can also look
>>> at the objecter_ops (another tell command) for the MDS.
>>>
>>> --
>>> Patrick Donnelly, Ph.D.
>>> He / Him / His
>>> Principal Software Engineer
>>> Red Hat, Inc.
>>> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>>>
>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx