Hello Frank, Unfortunately, the ballooning of memory consumed by the MDS is a known issue. Please add a lot of swap space (as a rough estimate, 2 GB of swap per 1 mln of files stored) to complete the scrub, and ignore the warning. Yes, there were cases on this list where 2 TB of swap was insufficient. And yes, the scrub gets automatically canceled if the MDS restarts, crashes, or kets killed. On Fri, Jan 10, 2025 at 11:42 PM Frank Schilder <frans@xxxxxx> wrote: > > Hi all, > > we started a forward scrub on our 5.x PB ceph file system and observe a massive ballooning of MDS caches. Our status is: > > # ceph status > cluster: > id: xxx > health: HEALTH_WARN > 1 MDSs report oversized cache > (muted: MDS_CLIENT_LATE_RELEASE(12d) MDS_CLIENT_RECALL(12d) PG_NOT_DEEP_SCRUBBED(5d) PG_NOT_SCRUBBED(5d) POOL_NEAR_FULL(4w)) > > services: > mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 4w) > mgr: ceph-03(active, since 3w), standbys: ceph-01, ceph-25, ceph-26, ceph-02 > mds: 8/8 daemons up, 4 standby > osd: 1317 osds: 1316 up (since 5d), 1316 in (since 5w) > > task status: > scrub status: > mds.ceph-08: active paths [/] > mds.ceph-11: active paths [/] > mds.ceph-12: active paths [/] > mds.ceph-14: active paths [/] > mds.ceph-15: active paths [/] > mds.ceph-17: active paths [/] > mds.ceph-24: active paths [/] > > data: > volumes: 1/1 healthy > pools: 14 pools, 29161 pgs > objects: 4.38G objects, 5.5 PiB > usage: 7.2 PiB used, 5.9 PiB / 13 PiB avail > pgs: 29129 active+clean > 28 active+clean+scrubbing+deep > 2 active+clean+snaptrim > 2 active+clean+scrubbing > > io: > client: 484 MiB/s rd, 64 MiB/s wr, 4.64k op/s rd, 1.50k op/s wr > > # ceph fs status > con-fs2 - 1554 clients > ======= > RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS > 0 active ceph-12 Reqs: 3 /s 13.8M 13.8M 478k 29.2k > 1 active ceph-15 Reqs: 31 /s 10.5M 10.5M 140k 441k > 2 active ceph-14 Reqs: 0 /s 11.5M 11.5M 504k 32.8k > 3 active ceph-17 Reqs: 5 /s 12.4M 12.4M 487k 30.9k > 4 active ceph-08 Reqs: 0 /s 15.3M 15.3M 247k 47.2k > 5 active ceph-11 Reqs: 7 /s 4414k 4413k 262k 72.5k > 6 active ceph-16 Reqs: 409 /s 1079k 1057k 7766 17.3k > 7 active ceph-24 Reqs: 41 /s 4074k 4074k 448k 109k > POOL TYPE USED AVAIL > con-fs2-meta1 metadata 4078G 6657G > con-fs2-meta2 data 0 6657G > con-fs2-data data 1225T 2272T > con-fs2-data-ec-ssd data 794G 20.8T > con-fs2-data2 data 5745T 2066T > STANDBY MDS > ceph-09 > ceph-10 > ceph-23 > ceph-13 > MDS version: ceph version 16.2.15 (618f440892089921c3e944a991122ddc44e60516) pacific (stable) > > Ranks 0-4 are massively oversized. Ranks 5 and 7 show usual values. Rank 6 is low, because we restarted it already due to this warning. It seems as if MDSes don't trim cache during scrub, the sizes only increase. > > Is this ballooning normal or a bug? Is there a workaround apart from restarting MDSes all the time? > > Thanks and best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Alexander Patrakov _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx