another osd_pglog memory usage incident

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Wed, 7 Oct 2020 14:08:45 +0200

Hi all,

This morning some osds in our S3 cluster started going OOM, after
restarting them I noticed that the osd_pglog is using >1.5GB per osd.
(This is on an osd with osd_memory_target = 2GB, hosting 112PGs, all
PGs are active+clean).

After reading through this list and trying a few things, I'd like to
share the following observations for your feedback:

1. The pg log contains 3000 entries by default (on nautilus). These
3000 entries can legitimately consume gigabytes of ram for some
use-cases. (I haven't determined exactly which ops triggered this
today).
2. The pg log length is decided by the primary osd -- setting
osd_max_pg_log_entries/osd_min_pg_log_entries on one single OSD does
not have a big effect (because most of the PGs are primaried somewhere
else). You need to set it on all the osds for it to be applied to all
PGs.
3. We eventually set osd_max_pg_log_entries = 500 everywhere. This
decreased the osd_pglog mempool from more than 1.5GB on our largest
osds to less that 500MB.
4. The osd_pglog mempool is not accounted for in the osd_memory_target
(in nautilus).
5. I have opened a feature request to limit the pg_log length by
memory size (https://tracker.ceph.com/issues/47775). This way we could
allocate a fraction of memory to the pg log and it would shorten the
pglog length (budget) accordingly.
6. Would it be feasible to add an osd option to 'trim pg log at boot'
? This way we could avoid the cumbersome ceph-objectstore-tool
trim-pg-log in cases of disaster (osds going oom at boot).

For those that had pglog memory usage incidents -- does this match
your experience?

Thanks!

Dan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx