A further observation: after restarting rank 6 due to oversized cache, rank 6 is no longer shown in the task list of ceph status below. Is an instruction for scrub not sticky to the rank or is the status output incorrect? Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Frank Schilder <frans@xxxxxx> Sent: Friday, January 10, 2025 4:38 PM To: ceph-users@xxxxxxx Subject: MDSs report oversized cache during forward scrub Hi all, we started a forward scrub on our 5.x PB ceph file system and observe a massive ballooning of MDS caches. Our status is: # ceph status cluster: id: xxx health: HEALTH_WARN 1 MDSs report oversized cache (muted: MDS_CLIENT_LATE_RELEASE(12d) MDS_CLIENT_RECALL(12d) PG_NOT_DEEP_SCRUBBED(5d) PG_NOT_SCRUBBED(5d) POOL_NEAR_FULL(4w)) services: mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 4w) mgr: ceph-03(active, since 3w), standbys: ceph-01, ceph-25, ceph-26, ceph-02 mds: 8/8 daemons up, 4 standby osd: 1317 osds: 1316 up (since 5d), 1316 in (since 5w) task status: scrub status: mds.ceph-08: active paths [/] mds.ceph-11: active paths [/] mds.ceph-12: active paths [/] mds.ceph-14: active paths [/] mds.ceph-15: active paths [/] mds.ceph-17: active paths [/] mds.ceph-24: active paths [/] data: volumes: 1/1 healthy pools: 14 pools, 29161 pgs objects: 4.38G objects, 5.5 PiB usage: 7.2 PiB used, 5.9 PiB / 13 PiB avail pgs: 29129 active+clean 28 active+clean+scrubbing+deep 2 active+clean+snaptrim 2 active+clean+scrubbing io: client: 484 MiB/s rd, 64 MiB/s wr, 4.64k op/s rd, 1.50k op/s wr # ceph fs status con-fs2 - 1554 clients ======= RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active ceph-12 Reqs: 3 /s 13.8M 13.8M 478k 29.2k 1 active ceph-15 Reqs: 31 /s 10.5M 10.5M 140k 441k 2 active ceph-14 Reqs: 0 /s 11.5M 11.5M 504k 32.8k 3 active ceph-17 Reqs: 5 /s 12.4M 12.4M 487k 30.9k 4 active ceph-08 Reqs: 0 /s 15.3M 15.3M 247k 47.2k 5 active ceph-11 Reqs: 7 /s 4414k 4413k 262k 72.5k 6 active ceph-16 Reqs: 409 /s 1079k 1057k 7766 17.3k 7 active ceph-24 Reqs: 41 /s 4074k 4074k 448k 109k POOL TYPE USED AVAIL con-fs2-meta1 metadata 4078G 6657G con-fs2-meta2 data 0 6657G con-fs2-data data 1225T 2272T con-fs2-data-ec-ssd data 794G 20.8T con-fs2-data2 data 5745T 2066T STANDBY MDS ceph-09 ceph-10 ceph-23 ceph-13 MDS version: ceph version 16.2.15 (618f440892089921c3e944a991122ddc44e60516) pacific (stable) Ranks 0-4 are massively oversized. Ranks 5 and 7 show usual values. Rank 6 is low, because we restarted it already due to this warning. It seems as if MDSes don't trim cache during scrub, the sizes only increase. Is this ballooning normal or a bug? Is there a workaround apart from restarting MDSes all the time? Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx