Re: OSDs taking too much memory, for pglog

Wout van Heeswijk <wout@xxxxxxxx> · Thu, 14 May 2020 11:08:23 +0200

Hi Harald,

Your cluster has a lot of objects per osd/pg and the pg logs will grow 
fast and large because of this. The pg_logs will keep growing as long as 
you're clusters pgs are not active+clean. This means you are now in a 
loop where you cannot get stable running OSDs because the pg_logs take 
too much memory, and therefore the OSDs cannot purge the pg_logs...

I suggest you lower the values for both the osd_min_pg_log_entries and 
the osd_max_pg_log_entries. Lowering these values will cause Ceph to go 
into backfilling much earlier, but the memory usage of the OSDs will go 
down significantly enabling them to run stable. The default is 3000 for 
both of these values.

You can lower them to 500 by executing:

ceph config set osd osd_min_pg_log_entries 500
ceph config set osd osd_max_pg_log_entries 500

When you lower these values, you will get more backfilling instead of 
recoveries but I think it will help you get through this situation.

kind regards,

Wout
42on

On 13-05-2020 07:27, Harald Staub wrote:
Hi Mark

Thank you for your feedback!

The maximum number of PGs per OSD is only 123. But we have PGs with a 
lot of objects. For RGW, there is an EC pool 8+3 with 1024 PGs with 
900M objects, maybe this is the problematic part. The OSDs are 510 
hdd, 32 ssd.

Not sure, do you suggest to use something like
ceph-objectstore-tool --op trim-pg-log ?

When done correctly, would the risk be a lot of backfilling? Or also 
data loss?

Also, to get up the cluster is one thing, to keep it running seems to 
be a real challenge right now (OOM killer) ...

Cheers
 Harry

On 13.05.20 07:10, Mark Nelson wrote:
Hi Herald,

Changing the bluestore cache settings will have no effect at all on 
pglog memory consumption.  You can try either reducing the number of 
PGs (you might want to check and see how many PGs you have and 
specifically how many PGs on that OSD), or decrease the number of 
pglog entries per PG.  Keep in mind that fewer PG log entries may 
impact recovery.  FWIW, 8.5GB of memory usage for pglog implies that 
you have a lot of PGs per OSD, so that's probably the first place to 
look.

Good luck!

Mark

On 5/12/20 5:10 PM, Harald Staub wrote:
Several OSDs of one of our clusters are down currently because RAM 
usage has increased during the last days. Now it is more than we can 
handle on some systems. Frequently OSDs get killed by the OOM 
killer. Looking at "ceph daemon osd.$OSD_ID dump_mempools", it shows 
that nearly all (about 8.5 GB) is taken by osd_pglog, e.g.

            "osd_pglog": {
                "items": 461859,
                "bytes": 8445595868
            },

We tried to reduce it, with "osd memory target" and even with 
"bluestore cache autotune = false" (together with "bluestore cache 
size hdd"), but there was no effect at all.

I remember the pglog_hardlimit parameter, but that is already set by 
default with Nautilus I read. I.e. this is on Nautilus, 14.2.8.

Is there a way to limit this pglog memory?

Cheers
 Harry
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx