Hi Marius, Your mempools show quite high usage in osd_pglog and buffer_anon, which reminds me of this issue: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/7IMIWCKIHXNULEBHVUIXQQGYUDJAO2SF/#Y2225AVEZYCBIJXXATJIJAXUWKNP4H3I You can configure the pglog size to reduce the memory usage. But I don't recall if anyone found the root cause why sometimes the pglog size explodes. (it didn't happen again for us) -- Dan On Fri, Nov 12, 2021 at 1:37 PM Marius Leustean <marius.leus@xxxxxxxxx> wrote: > > Hi Josh, > > There is 1 OSD per host. > There are 3 pools of 256, 128 and 32 PGs (total = 416 PGs across 8 OSDs). > > ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus > (stable) > > I still have 1 OSD where docker reports 61GB RAM being consumed by the > container (we have containerized deployment). > The dump_mempools is this one: https://paste2.org/yfng2saG (it reports 38GB) > > All OSDs are currently up+in and cluster is HEALTH_OK. > > Thanks! > > On Fri, Nov 12, 2021 at 2:22 PM Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx> > wrote: > > > Hi Marius, > > > > > We have a 8 hosts cluster with 4TB NVMe drive per host for now. The pool > > > size is 2 and it's hosting RBD images for VMs. > > > Each host has 128GB RAM installed. > > > > How many OSDs/host? How many PGs/OSD? Which Ceph version? > > > > > What is really happening during recovery / backfills that takes this much > > > amount of memory for 1 single OSD? > > > > It would be helpful to see what "ceph daemon osd.XXX dump_mempools" > > command says for an OSD with high memory. One problem that has been > > seen is the pglogs start consuming quite a bit of memory during > > recovery scenarios (or even occasionally during steady state). This > > issue has been alleviated a bit in Octopus+, where there's a limit on > > the number of pglog entries per OSD, but there are still gaps. > > > > > Why is the OSD process taking ~100GB RAM and have 25min start time even > > if > > > the recovery process ended? (unless we wipe it and register it again). > > > > This sounds like a pileup of osdmaps. Depending on your Ceph version, > > all OSDs may need to be up+in in order to trim osdmaps effectively. > > > > Josh > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx