Hi all, This morning some osds in our S3 cluster started going OOM, after restarting them I noticed that the osd_pglog is using >1.5GB per osd. (This is on an osd with osd_memory_target = 2GB, hosting 112PGs, all PGs are active+clean). After reading through this list and trying a few things, I'd like to share the following observations for your feedback: 1. The pg log contains 3000 entries by default (on nautilus). These 3000 entries can legitimately consume gigabytes of ram for some use-cases. (I haven't determined exactly which ops triggered this today). 2. The pg log length is decided by the primary osd -- setting osd_max_pg_log_entries/osd_min_pg_log_entries on one single OSD does not have a big effect (because most of the PGs are primaried somewhere else). You need to set it on all the osds for it to be applied to all PGs. 3. We eventually set osd_max_pg_log_entries = 500 everywhere. This decreased the osd_pglog mempool from more than 1.5GB on our largest osds to less that 500MB. 4. The osd_pglog mempool is not accounted for in the osd_memory_target (in nautilus). 5. I have opened a feature request to limit the pg_log length by memory size (https://tracker.ceph.com/issues/47775). This way we could allocate a fraction of memory to the pg log and it would shorten the pglog length (budget) accordingly. 6. Would it be feasible to add an osd option to 'trim pg log at boot' ? This way we could avoid the cumbersome ceph-objectstore-tool trim-pg-log in cases of disaster (osds going oom at boot). For those that had pglog memory usage incidents -- does this match your experience? Thanks! Dan _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx