Re: OSD Crash in recovery: SST file contains data beyond the point of corruption.

Igor Fedotov <igor.fedotov@xxxxxxxx> · Mon, 19 Sep 2022 11:51:34 +0300

Hi Benjamin,

good to know, thanks for the feedback.

Curious which exact mode (kSkipAnyCorruptedRecords or 
kTolerateCorruptedTailRecords) did the trick?

Regards,

Igor

On 9/17/2022 4:57 PM, Benjamin Naber wrote:
Hey Igor,

i just wanted to thank you for the help!
With the Flag you told me, i was able to brind up the OSD Container, 
then i marked it out and let ceph rebalance the PG´s and Objects to 
other OSD´s.
Yesterday all PG´s got avaiable again i could wipe the failed OSD 
drive an created a new one. Now everything works without the rocksdb 
flag and seems to fine!
THANK YOU SO MUCH!

Regards

Ben

Am Dienstag, September 13, 2022 11:26 CEST, schrieb Igor Fedotov 
<igor.fedotov@xxxxxxxx>:

Hi Benjamin,

sorry for the confusion this should be kSkipAnyCorruptedRecords not 
kSkipAnyCorruptedRecord

Thanks,

Igor

On 9/12/2022 11:26 PM, Benjamin Naber wrote:
Hi Igor,

looks like the setting wont work, the container now starts with a 
different error message that the setting is an invalid argument.
Did i something wrong by setting: /ceph config set osd.4 
bluestore_rocksdb_options_annex 
"wal_recovery_mode=kSkipAnyCorruptedRecord"/ ?

debug 2022-09-12T20:20:45.044+0000 ffff8714e040  1 bluefs 
add_block_device bdev 1 path /var/lib/ceph/osd/ceph-4/block size 2.7 TiB
debug 2022-09-12T20:20:45.044+0000 ffff8714e040  1 bluefs mount
debug 2022-09-12T20:20:45.044+0000 ffff8714e040  1 bluefs _init_alloc 
shared, id 1, capacity 0x2baa1000000, block size 0x10000
debug 2022-09-12T20:20:45.608+0000 ffff8714e040  1 bluefs mount 
shared_bdev_used = 0
debug 2022-09-12T20:20:45.608+0000 ffff8714e040  1 
bluestore(/var/lib/ceph/osd/ceph-4) _prepare_db_environment set 
db_paths to db,2850558889164 db.slow,2850558889164
debug 2022-09-12T20:20:45.608+0000 ffff8714e040 -1 rocksdb: Invalid 
argument: No mapping for enum : wal_recovery_mode
debug 2022-09-12T20:20:45.608+0000 ffff8714e040 -1 rocksdb: Invalid 
argument: No mapping for enum : wal_recovery_mode
debug 2022-09-12T20:20:45.608+0000 ffff8714e040  1 rocksdb: do_open 
load rocksdb options failed
debug 2022-09-12T20:20:45.608+0000 ffff8714e040 -1 
bluestore(/var/lib/ceph/osd/ceph-4) _open_db erroring opening db:
debug 2022-09-12T20:20:45.608+0000 ffff8714e040  1 bluefs umount
debug 2022-09-12T20:20:45.608+0000 ffff8714e040  1 
bdev(0xaaaaec8e3c00 /var/lib/ceph/osd/ceph-4/block) close
debug 2022-09-12T20:20:45.836+0000 ffff8714e040  1 
bdev(0xaaaaec8e2400 /var/lib/ceph/osd/ceph-4/block) close
debug 2022-09-12T20:20:46.088+0000 ffff8714e040 -1 osd.4 0 OSD:init: 
unable to mount object store
debug 2022-09-12T20:20:46.088+0000 ffff8714e040 -1  ** ERROR: osd 
init failed: (5) Input/output error

Regards and many thanks for the help!

Ben
Am Montag, September 12, 2022 21:14 CEST, schrieb Igor Fedotov 
<igor.fedotov@xxxxxxxx>:
Hi Benjamin,

honestly the following advice is unlikely to help but you may want to
try to set bluestore_rocksdb_options_annex to one of the following 
options:

- wal_recovery_mode=kTolerateCorruptedTailRecords

- wal_recovery_mode=kSkipAnyCorruptedRecord

The indication that the setting is in effect would be the respective
value at the end of following log line:

debug 2022-09-12T17:37:05.574+0000 ffffa8316040 4 rocksdb:
Options.wal_recovery_mode: 2

It should get 0 and 3 respectively.

Hoe this helps,

Igor

On 9/12/2022 9:09 PM, Benjamin Naber wrote:
> Hi Everybody,
>
> im struggeling now a couple of days with a degraded cehp cluster.
> Its a simple 3 node Cluster with 6 OSD´s, 3 SSD based, 3 HDD 
based. A couple of days ago one of the nodes crashed. in case of 
Hardisk failure, i replaces the hard disk and the recovery process 
started without any issues.
> As the node was still recovering the new replaced OSD drive was 
switched to backfillfull. And this is where the pain stareted. I 
added another node bought a harddrive and wiped the replacement OSD.
> The Cluster then was a 4 node sized cluster with 3 OSD´s for the 
SSD pool and 4 OSD´s for the HDD pool.
> Then i started the recovery process from beginning. Ceph has also 
started at this point a reassingment of missplaced objects.
> Then a power failure to one of the remaining nodes happend and now 
im stucking with a degraded Cluster and  49 pgs inactive, 3 pgs 
incomplete.
> The OSD Container on the power failure node dindt come up anymore 
in case of rocksdb error. Any advice how the recover the corrupt 
rocksdb ?
> Container Log and rocksdb error:
>
> https://pastebin.com/gvGJdubx
>
> Regards an thanks for your help!
>
> Ben
>
>
> --
> ___________________________________________________
> Diese E-mail einschließlich eventuell angehängter Dateien enthält 
vertrauliche und / oder rechtlich geschützte Informationen. Wenn Sie 
nicht der richtige Adressat sind und diese E-mail irrtümlich 
erhalten haben, dürfen Sie weder den Inhalt dieser E-mail nutzen 
noch dürfen Sie die eventuell angehängten Dateien öffnen und auch 
keine Kopie fertigen oder den Inhalt weitergeben / verbreiten. Bitte 
verständigen Sie den Absender und löschen Sie diese E-mail und 
eventuell angehängte Dateien umgehend.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

--
___________________________________________________
Benjamin Naber • Holzstraße 7 • D-73650 Winterbach
Mobil: +49 (0) 152.34087809
E-Mail: benjamin.naber@xxxxxxxxxxxxxx
___________________________________________________
Diese E-mail einschließlich eventuell angehängter Dateien enthält 
vertrauliche und / oder rechtlich geschützte Informationen. Wenn Sie 
nicht der richtige Adressat sind und diese E-mail irrtümlich erhalten 
haben, dürfen Sie weder den Inhalt dieser E-mail nutzen noch dürfen 
Sie die eventuell angehängten Dateien öffnen und auch keine Kopie 
fertigen oder den Inhalt weitergeben / verbreiten. Bitte verständigen 
Sie den Absender und löschen Sie diese E-mail und eventuell 
angehängte Dateien umgehend.
--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us athttps://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web:https://croit.io  | YouTube:https://goo.gl/PGE1Bx

--
___________________________________________________
Benjamin Naber • Holzstraße 7 • D-73650 Winterbach
Mobil: +49 (0) 152.34087809
E-Mail: benjamin.naber@xxxxxxxxxxxxxx
___________________________________________________
Diese E-mail einschließlich eventuell angehängter Dateien enthält 
vertrauliche und / oder rechtlich geschützte Informationen. Wenn Sie 
nicht der richtige Adressat sind und diese E-mail irrtümlich erhalten 
haben, dürfen Sie weder den Inhalt dieser E-mail nutzen noch dürfen 
Sie die eventuell angehängten Dateien öffnen und auch keine Kopie 
fertigen oder den Inhalt weitergeben / verbreiten. Bitte verständigen 
Sie den Absender und löschen Sie diese E-mail und eventuell angehängte 
Dateien umgehend. 

--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us athttps://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web:https://croit.io  | YouTube:https://goo.gl/PGE1Bx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx