Hello all, I'm maintaining a small Nautilus 12 OSD cluster (36TB raw). My mon nodes have the mgr/mds collocated/stacked with the mon. Each are allocated 10gb of RAM. During a recent single disk failure and corresponding recovery, I noticed my mgr/mon's were starting to get OOM killed/restarted every 5ish hours - the mgr using around 6.5GB on all my nodes. My monitoring shows an interesting sawtooth pattern with network usage (100MB/s at max), disk storage usage, and disk IO (up to 300MB/s against SSD's at max) usage increasing in parallel with memory usage. I know the docs for hardware recommendations say: > Monitor and manager daemon memory usage generally scales with the size of the cluster. For small clusters, 1-2 GB is generally sufficient. For large clusters, you should provide more (5-10 GB). Now, I would like to think my cluster is on the small size of things, so I was hoping 10gb is enough for the mgr and mon (my OSD nodes are only allocated 32GB of ram), but that assumption appears to be false. So I was wondering how mgr's (and to a lesser extent mon's) are expected to scale in terms of memory. Is it the osd count, or the osd's size, number of pg's, etc.? And if there's a way to limit the amount of RAM used by these mgr's (it seems the mon_osd_cache_size and rocksdb_cache_size settings are for mons if I'm not mistaken). Regards, Mark _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx