Oh Lordy, Seems like I finally got this resolved. And all it needed in the end was to drop the mds caches with: ceph tell mds.`hostname` cache drop The funny thing is that whatever the issue with the cache was it had persisted through several Ceph upgrades and node reboots. It's been a live production system so I guess that there just never has been a moment where all mds would have been down and thus make it fully rebuild the cache... maybe :| Unfortunately I don't remember when this issue arose and my metrics don't reach far back enough... but I wonder if this could have started already when I did Octopus->Pacific upgrade... Cheers, --------------------------- Olli Rajala - Lead TD Anima Vitae Ltd. www.anima.fi --------------------------- On Mon, Oct 24, 2022 at 9:36 PM Olli Rajala <olli.rajala@xxxxxxxx> wrote: > I tried my luck and upgraded to 17.2.4 but unfortunately that didn't make > any difference here either. > > I also looked more again at all kinds of client op and request stats and > wotnot which only made me even more certain that this io is not caused by > any clients. > > What internal mds operation or mechanism could cause such high idle write > io? I've tried to fiddle a bit with some of the mds cache trim and memory > settings but I haven't noticed any effect there. Any pointers appreciated. > > Cheers, > --------------------------- > Olli Rajala - Lead TD > Anima Vitae Ltd. > www.anima.fi > --------------------------- > > > On Mon, Oct 17, 2022 at 10:28 AM Olli Rajala <olli.rajala@xxxxxxxx> wrote: > >> Hi Patrick, >> >> With "objecter_ops" did you mean "ceph tell mds.pve-core-1 ops" and/or >> "ceph tell mds.pve-core-1 objecter_requests"? Both these show very few >> requests/ops - many times just returning empty lists. I'm pretty sure that >> this I/O isn't generated by any clients - I've earlier tried to isolate >> this by shutting down all cephfs clients and this didn't have any >> noticeable effect. >> >> I tried to watch what is going on with that "perf dump" but to be honest >> all I can see is some numbers going up in the different sections :) >> ...don't have a clue what to focus on and how to interpret that. >> >> Here's a perf dump if you or anyone could make something out of that: >> https://gist.github.com/olliRJL/43c10173aafd82be22c080a9cd28e673 >> >> Tnx! >> o. >> >> --------------------------- >> Olli Rajala - Lead TD >> Anima Vitae Ltd. >> www.anima.fi >> --------------------------- >> >> >> On Fri, Oct 14, 2022 at 8:32 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> >> wrote: >> >>> Hello Olli, >>> >>> On Thu, Oct 13, 2022 at 5:01 AM Olli Rajala <olli.rajala@xxxxxxxx> >>> wrote: >>> > >>> > Hi, >>> > >>> > I'm seeing constant 25-50MB/s writes to the metadata pool even when all >>> > clients and the cluster is idling and in clean state. This surely >>> can't be >>> > normal? >>> > >>> > There's no apparent issues with the performance of the cluster but this >>> > write rate seems excessive and I don't know where to look for the >>> culprit. >>> > >>> > The setup is Ceph 16.2.9 running in hyperconverged 3 node core cluster >>> and >>> > 6 hdd osd nodes. >>> > >>> > Here's typical status when pretty much all clients are idling. Most of >>> that >>> > write bandwidth and maybe fifth of the write iops is hitting the >>> > metadata pool. >>> > >>> > >>> --------------------------------------------------------------------------------------------------- >>> > root@pve-core-1:~# ceph -s >>> > cluster: >>> > id: 2088b4b1-8de1-44d4-956e-aa3d3afff77f >>> > health: HEALTH_OK >>> > >>> > services: >>> > mon: 3 daemons, quorum pve-core-1,pve-core-2,pve-core-3 (age 2w) >>> > mgr: pve-core-1(active, since 4w), standbys: pve-core-2, pve-core-3 >>> > mds: 1/1 daemons up, 2 standby >>> > osd: 48 osds: 48 up (since 5h), 48 in (since 4M) >>> > >>> > data: >>> > volumes: 1/1 healthy >>> > pools: 10 pools, 625 pgs >>> > objects: 70.06M objects, 46 TiB >>> > usage: 95 TiB used, 182 TiB / 278 TiB avail >>> > pgs: 625 active+clean >>> > >>> > io: >>> > client: 45 KiB/s rd, 38 MiB/s wr, 6 op/s rd, 287 op/s wr >>> > >>> --------------------------------------------------------------------------------------------------- >>> > >>> > Here's some daemonperf dump: >>> > >>> > >>> --------------------------------------------------------------------------------------------------- >>> > root@pve-core-1:~# ceph daemonperf mds.`hostname -s` >>> > >>> ----------------------------------------mds----------------------------------------- >>> > --mds_cache--- ------mds_log------ -mds_mem- -------mds_server------- >>> mds_ >>> > -----objecter------ purg >>> > req rlat fwd inos caps exi imi hifc crev cgra ctru cfsa cfa hcc >>> hccd >>> > hccr prcr|stry recy recd|subm evts segs repl|ino dn |hcr hcs hsr >>> cre >>> > cat |sess|actv rd wr rdwr|purg| >>> > 40 0 0 767k 78k 0 0 0 1 6 1 0 0 5 >>> 5 >>> > 3 7 |1.1k 0 0 | 17 3.7k 134 0 |767k 767k| 40 5 0 >>> 0 >>> > 0 |110 | 4 2 21 0 | 2 >>> > 57 2 0 767k 78k 0 0 0 3 16 3 0 0 11 >>> 11 >>> > 0 17 |1.1k 0 0 | 45 3.7k 137 0 |767k 767k| 57 8 0 >>> 0 >>> > 0 |110 | 0 2 28 0 | 4 >>> > 57 4 0 767k 78k 0 0 0 4 34 4 0 0 34 >>> 33 >>> > 2 26 |1.0k 0 0 |134 3.9k 139 0 |767k 767k| 57 13 0 >>> 0 >>> > 0 |110 | 0 2 112 0 | 19 >>> > 67 3 0 767k 78k 0 0 0 6 32 6 0 0 22 >>> 22 >>> > 0 32 |1.1k 0 0 | 78 3.9k 141 0 |767k 768k| 67 4 0 >>> 0 >>> > 0 |110 | 0 2 56 0 | 2 >>> > >>> --------------------------------------------------------------------------------------------------- >>> > Any ideas where to look at? >>> >>> Check the perf dump output of the mds: >>> >>> ceph tell mds.<fs_name>:0 perf dump >>> >>> over a period of time to identify what's going on. You can also look >>> at the objecter_ops (another tell command) for the MDS. >>> >>> -- >>> Patrick Donnelly, Ph.D. >>> He / Him / His >>> Principal Software Engineer >>> Red Hat, Inc. >>> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D >>> >>> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx