Re: Extremely high OSD memory utilization on Kraken 11.2.0 (with XFS -or- bluestore)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Bob,

I have only managed to get heap dumps from a single OSD. The memory spike doesn't happen until 10+ OSDs are online, and within just a few moments of this happening the system becomes unresponsive and oom_killer swoops down. Basically I haven't been able to time it right to get the heaps. Is there a configuration file option to enable profiling at boot and dump the data once a second or something? That'd at least enable capturing the data.

Here's what I got, the 35th dump was just before oom_killer, but memory usage hadn't spiked much. Total allocation for the process was about 4.2GiB.

https://pastebin.com/nLQ8Jpwt

Thanks again for the insight!
-Aaron

On Sat, Apr 15, 2017 at 10:34 AM, Aaron Ten Clay <aarontc@xxxxxxxxxxx> wrote:
Thanks for the recommendation, Bob! I'll try to get this data later today and reply with it.

-Aaron

On Sat, Apr 15, 2017 at 9:46 AM, Bob R <bobr@xxxxxxxxxxxxxx> wrote:
I'd recommend running through these steps and posting the output as well

Bob

On Sat, Apr 15, 2017 at 5:39 AM, Peter Maloney <peter.maloney@brockmann-consult.de> wrote:
How many PGs do you have? And did you change any config, like mds cache size? Show your ceph.conf.


On 04/15/17 07:34, Aaron Ten Clay wrote:
Hi all,

Our cluster is experiencing a very odd issue and I'm hoping for some guidance on troubleshooting steps and/or suggestions to mitigate the issue. tl;dr: Individual ceph-osd processes try to allocate > 90GiB of RAM and are eventually nuked by oom_killer.

I'll try to explain the situation in detail:

We have 24-4TB bluestore HDD OSDs, and 4-600GB SSD OSDs. The SSD OSDs are in a different CRUSH "root", used as a cache tier for the main storage pools, which are erasure coded and used for cephfs. The OSDs are spread across two identical machines with 128GiB of RAM each, and there are three monitor nodes on different hardware.

Several times we've encountered crippling bugs with previous Ceph releases when we were on RC or betas, or using non-recommended configurations, so in January we abandoned all previous Ceph usage, deployed LTS Ubuntu 16.04, and went with stable Kraken 11.2.0 with the configuration mentioned above. Everything was fine until the end of March, when one day we find all but a couple of OSDs are "down" inexplicably. Investigation reveals oom_killer came along and nuked almost all the ceph-osd processes.

We've gone through a bunch of iterations of restarting the OSDs, trying to bring them up one at a time gradually, all at once, various configuration settings to reduce cache size as suggested in this ticket: http://tracker.ceph.com/issues/18924...

I don't know if that ticket really pertains to our situation or not, I have no experience with memory allocation debugging. I'd be willing to try if someone can point me to a guide or walk me through the process.

I've even tried, just to see if the situation was  transitory, adding over 300GiB of swap to both OSD machines. The OSD procs managed to allocate, in a matter of 5-10 minutes, more than 300GiB of RAM pressure and became oom_killer victims once again.

No software or hardware changes took place around the time this problem started, and no significant data changes occurred either. We added about 40GiB of ~1GiB files a week or so before the problem started and that's the last time data was written.

I can only assume we've found another crippling bug of some kind, this level of memory usage is entirely unprecedented. What can we do?

Thanks in advance for any suggestions.
-Aaron


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





--
Aaron Ten Clay
https://aarontc.com



--
Aaron Ten Clay
https://aarontc.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux