Re: Upgrade to 16.2.6 and osd+mds crash after bluestore_fsck_quick_fix_on_mount true

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Thilo,

theoretically this is a recoverable case - due to the bug new prefix was inserted at the beginning of every OMAP record instead of replacing old one. So one has to just remove an old prefix to fix that (to-be-removed prefix starts after the first '.' char and ends with the second one (inclusive). E.g. the following key:

p %00%00%00%00%00%00%00%03%00%00%00%00%00%00%00%00%00%00%04t.%00%00%00%00%00%00%04t._infover

to be converted to

p %00%00%00%00%00%00%00%03%00%00%00%00%00%00%00%00%00%00%04t._infover

One can use ceph-kvstore-tool's list command against 'p' prefix to view all the omap keys in DB.


Unfortunately currently we don't have any means to perform such a conversion in a bulk manner.  There are single key retrieval/update operations in ceph-kvstore-tool but this would be terribly inefficient for tons of records due to tool's startup/teardown overhead.

Potentially such a bulk recovery can be added to ceph-kvstore-tool or something but given release cycle procedure and related timings I doubt that's what you like to get at the moment.. So I can probably make a source patch with the fix but one needs to build it for his envrironment. Not to mention all the risks of using urgent modification which bypasses the QA/review procedure...

Would it work for you?


Thanks,

Igor


On 10/30/2021 11:59 AM, Thilo Molitor wrote:
I have the exact same problem: I upgraded to 16.2.6 and set
bluestore_fsck_quick_fix_on_mount to true, after a rolling restart of my osds
only 2 of 5 came back (one of them was only recently added and has only very
few data, so in essence there is only 1 osd really running).
Al other osds crashed with:
./src/osd/PG.cc: In function 'static int PG::peek_map_epoch(ObjectStore*,
spg_t, epoch_t*)' thread 7f68d999bd00 time 2021-10-30T08:59:11.782259+0200
./src/osd/PG.cc: 1009: FAILED ceph_assert(values.size() == 2)
ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific
(stable)

My cluster does not come up anymore and I cannot access my data.
Any advice on how to recover here?

-tmolitor




_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux