Hi Venky, I have indeed observed the output of the different sections of perf dump like so: watch -n 1 ceph tell mds.`hostname` perf dump objecter watch -n 1 ceph tell mds.`hostname` perf dump mds_cache ...etc... ...but without any proper understanding of what is a normal rate for some number to go up it's really difficult to make anything from that. btw - is there some convenient way to capture this kind of temporal output for others to view. Sure, I could just dump once a second to a file or sequential files but is there some tool or convention that is easy to look at and analyze? Tnx, --------------------------- Olli Rajala - Lead TD Anima Vitae Ltd. www.anima.fi --------------------------- On Thu, Nov 10, 2022 at 8:18 AM Venky Shankar <vshankar@xxxxxxxxxx> wrote: > Hi Olli, > > On Mon, Oct 17, 2022 at 1:08 PM Olli Rajala <olli.rajala@xxxxxxxx> wrote: > > > > Hi Patrick, > > > > With "objecter_ops" did you mean "ceph tell mds.pve-core-1 ops" and/or > > "ceph tell mds.pve-core-1 objecter_requests"? Both these show very few > > requests/ops - many times just returning empty lists. I'm pretty sure > that > > this I/O isn't generated by any clients - I've earlier tried to isolate > > this by shutting down all cephfs clients and this didn't have any > > noticeable effect. > > > > I tried to watch what is going on with that "perf dump" but to be honest > > all I can see is some numbers going up in the different sections :) > > ...don't have a clue what to focus on and how to interpret that. > > > > Here's a perf dump if you or anyone could make something out of that: > > https://gist.github.com/olliRJL/43c10173aafd82be22c080a9cd28e673 > > You'd need to capture this over a period of time to see what ops might > be going through and what the mds is doing. > > > > > Tnx! > > o. > > > > --------------------------- > > Olli Rajala - Lead TD > > Anima Vitae Ltd. > > www.anima.fi > > --------------------------- > > > > > > On Fri, Oct 14, 2022 at 8:32 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> > > wrote: > > > > > Hello Olli, > > > > > > On Thu, Oct 13, 2022 at 5:01 AM Olli Rajala <olli.rajala@xxxxxxxx> > wrote: > > > > > > > > Hi, > > > > > > > > I'm seeing constant 25-50MB/s writes to the metadata pool even when > all > > > > clients and the cluster is idling and in clean state. This surely > can't > > > be > > > > normal? > > > > > > > > There's no apparent issues with the performance of the cluster but > this > > > > write rate seems excessive and I don't know where to look for the > > > culprit. > > > > > > > > The setup is Ceph 16.2.9 running in hyperconverged 3 node core > cluster > > > and > > > > 6 hdd osd nodes. > > > > > > > > Here's typical status when pretty much all clients are idling. Most > of > > > that > > > > write bandwidth and maybe fifth of the write iops is hitting the > > > > metadata pool. > > > > > > > > > > > > --------------------------------------------------------------------------------------------------- > > > > root@pve-core-1:~# ceph -s > > > > cluster: > > > > id: 2088b4b1-8de1-44d4-956e-aa3d3afff77f > > > > health: HEALTH_OK > > > > > > > > services: > > > > mon: 3 daemons, quorum pve-core-1,pve-core-2,pve-core-3 (age 2w) > > > > mgr: pve-core-1(active, since 4w), standbys: pve-core-2, > pve-core-3 > > > > mds: 1/1 daemons up, 2 standby > > > > osd: 48 osds: 48 up (since 5h), 48 in (since 4M) > > > > > > > > data: > > > > volumes: 1/1 healthy > > > > pools: 10 pools, 625 pgs > > > > objects: 70.06M objects, 46 TiB > > > > usage: 95 TiB used, 182 TiB / 278 TiB avail > > > > pgs: 625 active+clean > > > > > > > > io: > > > > client: 45 KiB/s rd, 38 MiB/s wr, 6 op/s rd, 287 op/s wr > > > > > > > > --------------------------------------------------------------------------------------------------- > > > > > > > > Here's some daemonperf dump: > > > > > > > > > > > > --------------------------------------------------------------------------------------------------- > > > > root@pve-core-1:~# ceph daemonperf mds.`hostname -s` > > > > > > > > ----------------------------------------mds----------------------------------------- > > > > --mds_cache--- ------mds_log------ -mds_mem- -------mds_server------- > > > mds_ > > > > -----objecter------ purg > > > > req rlat fwd inos caps exi imi hifc crev cgra ctru cfsa cfa hcc > > > hccd > > > > hccr prcr|stry recy recd|subm evts segs repl|ino dn |hcr hcs > hsr cre > > > > cat |sess|actv rd wr rdwr|purg| > > > > 40 0 0 767k 78k 0 0 0 1 6 1 0 0 > 5 5 > > > > 3 7 |1.1k 0 0 | 17 3.7k 134 0 |767k 767k| 40 5 0 > 0 > > > > 0 |110 | 4 2 21 0 | 2 > > > > 57 2 0 767k 78k 0 0 0 3 16 3 0 0 > 11 11 > > > > 0 17 |1.1k 0 0 | 45 3.7k 137 0 |767k 767k| 57 8 0 > 0 > > > > 0 |110 | 0 2 28 0 | 4 > > > > 57 4 0 767k 78k 0 0 0 4 34 4 0 0 > 34 33 > > > > 2 26 |1.0k 0 0 |134 3.9k 139 0 |767k 767k| 57 13 0 > 0 > > > > 0 |110 | 0 2 112 0 | 19 > > > > 67 3 0 767k 78k 0 0 0 6 32 6 0 0 > 22 22 > > > > 0 32 |1.1k 0 0 | 78 3.9k 141 0 |767k 768k| 67 4 0 > 0 > > > > 0 |110 | 0 2 56 0 | 2 > > > > > > > > --------------------------------------------------------------------------------------------------- > > > > Any ideas where to look at? > > > > > > Check the perf dump output of the mds: > > > > > > ceph tell mds.<fs_name>:0 perf dump > > > > > > over a period of time to identify what's going on. You can also look > > > at the objecter_ops (another tell command) for the MDS. > > > > > > -- > > > Patrick Donnelly, Ph.D. > > > He / Him / His > > > Principal Software Engineer > > > Red Hat, Inc. > > > GPG: 19F28A586F808C2402351B93C3301A3E258DD79D > > > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > -- > Cheers, > Venky > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx