Hello Olli, On Thu, Oct 13, 2022 at 5:01 AM Olli Rajala <olli.rajala@xxxxxxxx> wrote: > > Hi, > > I'm seeing constant 25-50MB/s writes to the metadata pool even when all > clients and the cluster is idling and in clean state. This surely can't be > normal? > > There's no apparent issues with the performance of the cluster but this > write rate seems excessive and I don't know where to look for the culprit. > > The setup is Ceph 16.2.9 running in hyperconverged 3 node core cluster and > 6 hdd osd nodes. > > Here's typical status when pretty much all clients are idling. Most of that > write bandwidth and maybe fifth of the write iops is hitting the > metadata pool. > > --------------------------------------------------------------------------------------------------- > root@pve-core-1:~# ceph -s > cluster: > id: 2088b4b1-8de1-44d4-956e-aa3d3afff77f > health: HEALTH_OK > > services: > mon: 3 daemons, quorum pve-core-1,pve-core-2,pve-core-3 (age 2w) > mgr: pve-core-1(active, since 4w), standbys: pve-core-2, pve-core-3 > mds: 1/1 daemons up, 2 standby > osd: 48 osds: 48 up (since 5h), 48 in (since 4M) > > data: > volumes: 1/1 healthy > pools: 10 pools, 625 pgs > objects: 70.06M objects, 46 TiB > usage: 95 TiB used, 182 TiB / 278 TiB avail > pgs: 625 active+clean > > io: > client: 45 KiB/s rd, 38 MiB/s wr, 6 op/s rd, 287 op/s wr > --------------------------------------------------------------------------------------------------- > > Here's some daemonperf dump: > > --------------------------------------------------------------------------------------------------- > root@pve-core-1:~# ceph daemonperf mds.`hostname -s` > ----------------------------------------mds----------------------------------------- > --mds_cache--- ------mds_log------ -mds_mem- -------mds_server------- mds_ > -----objecter------ purg > req rlat fwd inos caps exi imi hifc crev cgra ctru cfsa cfa hcc hccd > hccr prcr|stry recy recd|subm evts segs repl|ino dn |hcr hcs hsr cre > cat |sess|actv rd wr rdwr|purg| > 40 0 0 767k 78k 0 0 0 1 6 1 0 0 5 5 > 3 7 |1.1k 0 0 | 17 3.7k 134 0 |767k 767k| 40 5 0 0 > 0 |110 | 4 2 21 0 | 2 > 57 2 0 767k 78k 0 0 0 3 16 3 0 0 11 11 > 0 17 |1.1k 0 0 | 45 3.7k 137 0 |767k 767k| 57 8 0 0 > 0 |110 | 0 2 28 0 | 4 > 57 4 0 767k 78k 0 0 0 4 34 4 0 0 34 33 > 2 26 |1.0k 0 0 |134 3.9k 139 0 |767k 767k| 57 13 0 0 > 0 |110 | 0 2 112 0 | 19 > 67 3 0 767k 78k 0 0 0 6 32 6 0 0 22 22 > 0 32 |1.1k 0 0 | 78 3.9k 141 0 |767k 768k| 67 4 0 0 > 0 |110 | 0 2 56 0 | 2 > --------------------------------------------------------------------------------------------------- > Any ideas where to look at? Check the perf dump output of the mds: ceph tell mds.<fs_name>:0 perf dump over a period of time to identify what's going on. You can also look at the objecter_ops (another tell command) for the MDS. -- Patrick Donnelly, Ph.D. He / Him / His Principal Software Engineer Red Hat, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx