Hi, as someone who has gone through that just last week, that sounds a lot like the symptoms of my cluster. In case you are comfortable with docker (or any other container runtime), I have pushed an image [1] with quincy from a few days ago, the fix for pglog dups being included in that and was able to successfully clean my OSD with the ceph-objectstore-tool in it. Something like `CEPH_ARGS="--osd_pg_log_trim_max=50000 --osd_max_pg_log_entries=2000 ceph-objectstore-tool --data-path $osd_path --op trim-pg-log` should help (command mostly from memory, check it before executing it - as always). Best of luck, Mara [1] littlefox/ceph-daemon-base:2, based on commit 5d47b8e21e77a57e51781f00021f77c7967ebbe2 Am Mon, Jun 13, 2022 at 02:10:42PM +0000 schrieb Stefan:
Hello, I have been running Ceph for several years and everything has been rock solid until this weekend. Due to some unfortune events my cluster at home is down. I have two osd:s that don't boot and the reason seems to be this issue: https://tracker.ceph.com/issues/53729 I'm currently running version 17.2.0, but when i hit the issue I was on 16.2.7. In an attempt to fix the issue i upgraded first to 16.2.9 and then to 17.2.0, but it didn't help. I also tried giving it a huge swap. But it ended up krashing anyway. 1. There seems to be a fix for the issue in a github branch. https://github.com/NitzanMordhai/ceph/tree/wip-nitzan-pglog-dups-not-trimmed/ I don't have very advanced Ceph/Linux skills and i'm not 100% that i understand exacly how I should use it. Do I need to compile a complete Ceph installation and run that or can i pinpoint ceph-objectstore-tool in some way to only compile and run that? 2. The issue seems to be targeted for release in 17.2.1, is there any information when that will be released? Any advice would be very welcome since i was running a lot of different VM:s and didn't have all backed up.
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx