On Tue, Nov 29, 2022 at 1:18 PM Joshua Timmer <mrjoshuatimmer@xxxxxxxxx> wrote: > I've got a cluster in a precarious state because several nodes have run > out of memory due to extremely large pg logs on the osds. I came across > the pglog_hardlimit flag which sounds like the solution to the issue, > but I'm concerned that enabling it will immediately truncate the pg logs > and possibly drop some information needed to recover the pgs. There are > many in degraded and undersized states right now as nodes are down. Is > it safe to enable the flag in this state? The cluster is running > luminous 12.2.13 right now. The hard limit will truncate the log, but all the data goes into the backing bluestore/filestore instance at the same time. The pglogs are used for two things: 1) detecting replayed client operations and sending the same answer back on replays, so shorter logs means a shorter time window of detection but shouldn’t be an issue; 2) enabling log-based recovery of pgs where OSDs with overlapping logs can identify exactly which objects have been modified and only moving them. So if you set the hard limit, it’s possible you’ll induce more backfill as fewer logs overlap. But no data will be lost. -Greg > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx