My cluster is down. Two osd:s on different hosts uses all memory on boot and then crashes.

Stefan <slissm@xxxxxxxxxxxxxx> · Mon, 13 Jun 2022 14:10:42 +0000

Hello,

I have been running Ceph for several years and everything has been rock solid until this weekend.
Due to some unfortune events my cluster at home is down.

I have two osd:s that don't boot and the reason seems to be this issue: https://tracker.ceph.com/issues/53729

I'm currently running version 17.2.0, but when i hit the issue I was on 16.2.7. In an attempt to fix the issue i upgraded first to 16.2.9 and then to 17.2.0, but it didn't help.
I also tried giving it a huge swap. But it ended up krashing anyway.

1. There seems to be a fix for the issue in a github branch. https://github.com/NitzanMordhai/ceph/tree/wip-nitzan-pglog-dups-not-trimmed/ I don't have very advanced Ceph/Linux skills and i'm not 100% that i understand exacly how I should use it.
Do I need to compile a complete Ceph installation and run that or can i pinpoint ceph-objectstore-tool in some way to only compile and run that?
2. The issue seems to be targeted for release in 17.2.1, is there any information when that will be released?

Any advice would be very welcome since i was running a lot of different VM:s and didn't have all backed up.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx