Re: OSDs taking too much memory, for pglog

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Harald,


I was thinking just changing the config setting for the pglog length.  Having said that, if you only have 123 PGs per OSD max and 8.5GB of pglog memory usage that sounds like a bug to me.  Can you create a tracker ticket with the ceph version and assoicated info?  One of the folks that works on the pglog code may want to look more closely at it.


Mark


On 5/13/20 12:27 AM, Harald Staub wrote:
Hi Mark

Thank you for your feedback!

The maximum number of PGs per OSD is only 123. But we have PGs with a lot of objects. For RGW, there is an EC pool 8+3 with 1024 PGs with 900M objects, maybe this is the problematic part. The OSDs are 510 hdd, 32 ssd.

Not sure, do you suggest to use something like
ceph-objectstore-tool --op trim-pg-log ?

When done correctly, would the risk be a lot of backfilling? Or also data loss?

Also, to get up the cluster is one thing, to keep it running seems to be a real challenge right now (OOM killer) ...

Cheers
 Harry

On 13.05.20 07:10, Mark Nelson wrote:
Hi Herald,


Changing the bluestore cache settings will have no effect at all on pglog memory consumption.  You can try either reducing the number of PGs (you might want to check and see how many PGs you have and specifically how many PGs on that OSD), or decrease the number of pglog entries per PG.  Keep in mind that fewer PG log entries may impact recovery.  FWIW, 8.5GB of memory usage for pglog implies that you have a lot of PGs per OSD, so that's probably the first place to look.


Good luck!

Mark


On 5/12/20 5:10 PM, Harald Staub wrote:
Several OSDs of one of our clusters are down currently because RAM usage has increased during the last days. Now it is more than we can handle on some systems. Frequently OSDs get killed by the OOM killer. Looking at "ceph daemon osd.$OSD_ID dump_mempools", it shows that nearly all (about 8.5 GB) is taken by osd_pglog, e.g.

            "osd_pglog": {
                "items": 461859,
                "bytes": 8445595868
            },

We tried to reduce it, with "osd memory target" and even with "bluestore cache autotune = false" (together with "bluestore cache size hdd"), but there was no effect at all.

I remember the pglog_hardlimit parameter, but that is already set by default with Nautilus I read. I.e. this is on Nautilus, 14.2.8.

Is there a way to limit this pglog memory?

Cheers
 Harry
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux