Hi Gilles,
the PR you mentioned is present in Octopus but it looks like it's
ineffective/inappropriate against the issue.
Hence I think there is no much sense in upgrading to Pacific if that's
the only reason for you...
Actually I've been trying to catch the bug for a long time but that's
been unsuccessful so far. Not every cluster is affected by the issue,
apparently some tricky(?) use pattern triggers it.
Curious how long does it take for you OSDs to get to the assertion after
the repair?
And could you please share any additional details about cluster's usage?
So major use case is RBD, right? Replicated or EC pools? How often
snapshots are taken if any?
Thanks,
Igor
On 3/3/2022 1:45 PM, Gilles Mocellin wrote:
Hello !
On our Octopus (v15.2.15) cluster, mainly used for OpenStack,
We had several OSD crash.
Some would not restart with "no available blob id" assertion.
We found several related bugs :
https://tracker.ceph.com/issues/48216
https://tracker.ceph.com/issues/38272
The workaround that works is to fsck / repair the stopped OSD :
ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-<osd_id> --command
repair
But it's not a long term solution.
I have seen a PR merged in 2019 here :
https://github.com/ceph/ceph/pull/28229
But don't find if it made to Octopus, and if it resolves completely
the problem.
I also wonder if someone had that problem with Pacific, which could
motivate us to upgrade from Octopus.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Igor Fedotov
Ceph Lead Developer
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx