Hi Francois,
Could you please share OSD startup log with debug-bluestore (and
debug-bluefs) set to 20.
Also please run ceph-bluestore-tool's bluefs-bdev-sizes command and
share the output.
Thanks,
Igor
On 4/28/2020 12:55 AM, Francois Legrand wrote:
Hi all,
*** Short version ***
Is there a way to repair a rocksdb from errors "Encountered error
while reading data from compression dictionary block Corruption: block
checksum mismatch" and "_open_db erroring opening db" ?
*** Long version ***
We operate a nautilus ceph cluster (with 100 disks of 8TB in 6 servers
+ 4 mons/mgr + 3 mds).
We recently (Monday 20) upgraded from 14.2.7 to 14.2.8. This triggered
a rebalancing of some data.
Two days later (Wednesday 22) we had a very short power outage. Only
one of the osd servers went down (and unfortunately died).
This triggered a reconstruction of the losts osds. Operations went
fine until Saturday 25 where some osds in the 5 remaining servers
started to crash apparently with no reasons.
We tryed to restart them, but they crashed again. We ended with 18 osd
down (+ 16 in the dead server so 34 osd downs out of 100).
Looking at the logs we found for all the crashed osd :
-237> 2020-04-25 16:32:51.835 7f1f45527a80 3 rocksdb:
[table/block_based_table_reader.cc:1117] Encountered error while
reading data from compression dictionary block Corruption: block
checksum mismatch: expected 0, got 2729370997 in db/181355.sst offset
18446744073709551615 size 18446744073709551615
and
2020-04-25 16:05:47.251 7fcbd1e46a80 -1
bluestore(/var/lib/ceph/osd/ceph-3) _open_db erroring opening db:
We also noticed that the "Encountered error while reading data from
compression dictionary block Corruption: block checksum mismatch" was
present few days before the crash.
We also have some osd with this error but still up.
We tryed to repair with :
ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-3 repair
But no success (it ends with _open_db erroring opening db).
Thus does somebody have an idea to fix this or at least know if it's
possible to repair and correct the "Encountered error while reading
data from compression dictionary block Corruption: block checksum
mismatch" and "_open_db erroring opening db" !
Thanks for your help (we are desperate because we will loose datas and
are fighting to save something) !!!
F.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx