Expected Mgr Memory Usage

m@xxxxxxxxxxxx · Mon, 02 Mar 2020 23:54:08 -0000

Hello all,

I'm maintaining a small Nautilus 12 OSD cluster (36TB raw). My mon nodes have the mgr/mds collocated/stacked with the mon. Each are allocated 10gb of RAM.

During a recent single disk failure and corresponding recovery, I noticed my mgr/mon's were starting to get OOM killed/restarted every 5ish hours - the mgr using around 6.5GB on all my nodes. My monitoring shows an interesting sawtooth pattern with network usage (100MB/s at max), disk storage usage, and disk IO (up to 300MB/s against SSD's at max) usage increasing in parallel with memory usage.

I know the docs for hardware recommendations say:
> Monitor and manager daemon memory usage generally scales with the size of the cluster. For small clusters, 1-2 GB is generally sufficient. For large clusters, you should provide more (5-10 GB).

Now, I would like to think my cluster is on the small size of things, so I was hoping 10gb is enough for the mgr and mon (my OSD nodes are only allocated 32GB of ram), but that assumption appears to be false.

So I was wondering how mgr's (and to a lesser extent mon's) are expected to scale in terms of memory. Is it the osd count, or the osd's size, number of pg's, etc.? And if there's a way to limit the amount of RAM used by these mgr's (it seems the mon_osd_cache_size and rocksdb_cache_size settings are for mons if I'm not mistaken).

Regards,
Mark
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx