Re: Implications of pglog_hardlimit

Gregory Farnum <gfarnum@xxxxxxxxxx> · Tue, 29 Nov 2022 13:25:54 -0800

On Tue, Nov 29, 2022 at 1:18 PM Joshua Timmer <mrjoshuatimmer@xxxxxxxxx>
wrote:

> I've got a cluster in a precarious state because several nodes have run
> out of memory due to extremely large pg logs on the osds. I came across
> the pglog_hardlimit flag which sounds like the solution to the issue,
> but I'm concerned that enabling it will immediately truncate the pg logs
> and possibly drop some information needed to recover the pgs. There are
> many in degraded and undersized states right now as nodes are down. Is
> it safe to enable the flag in this state? The cluster is running
> luminous 12.2.13 right now.

The hard limit will truncate the log, but all the data goes into the
backing bluestore/filestore instance at the same time. The pglogs are used
for two things:
1) detecting replayed client operations and sending the same answer back on
replays, so shorter logs means a shorter time window of detection but
shouldn’t be an issue;
2) enabling log-based recovery of pgs where OSDs with overlapping logs can
identify exactly which objects have been modified and only moving them.

So if you set the hard limit, it’s possible you’ll induce more backfill as
fewer logs overlap. But no data will be lost.
-Greg

> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx