does objectstore-tool still work? If yes: export all the PGs on the OSD with objectstore-tool and important them into a new OSD. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 Am Do., 29. Nov. 2018 um 13:06 Uhr schrieb Igor Fedotov <ifedotov@xxxxxxx>: > > 'ceph-bluestore-tool repair' checks and repairs BlueStore metadata consistency not RocksDB one. > > It looks like you're observing CRC mismatch during DB compaction which is probably not triggered during the repair. > > Good point is that it looks like Bluestore's metadata are consistent and hence data recovery is still possible - potentially, can't build up a working procedure using existing tools though.. > > Let me check if one can disable DB compaction using rocksdb settings. > > > On 11/29/2018 1:42 PM, Mario Giammarco wrote: > > The only strange thing is that ceph-bluestore-tool says that repair was done, no errors are found and all is ok. > I ask myself what really does that tool. > Mario > > Il giorno gio 29 nov 2018 alle ore 11:03 Wido den Hollander <wido@xxxxxxxx> ha scritto: >> >> >> >> On 11/29/18 10:45 AM, Mario Giammarco wrote: >> > I have only that copy, it is a showroom system but someone put a >> > production vm on it. >> > >> >> I have a feeling this won't be easy to fix or actually fixable: >> >> - Compaction error: Corruption: block checksum mismatch >> - submit_transaction error: Corruption: block checksum mismatch >> >> RocksDB got corrupted on that OSD and won't be able to start now. >> >> I wouldn't know where to start with this OSD. >> >> Wido >> >> > Il giorno gio 29 nov 2018 alle ore 10:43 Wido den Hollander >> > <wido@xxxxxxxx <mailto:wido@xxxxxxxx>> ha scritto: >> > >> > >> > >> > On 11/29/18 10:28 AM, Mario Giammarco wrote: >> > > Hello, >> > > I have a ceph installation in a proxmox cluster. >> > > Due to a temporary hardware glitch now I get this error on osd startup >> > > >> > > -6> 2018-11-26 18:02:33.179327 7fa1d784be00 0 osd.0 1033 >> > crush map >> > > has features 1009089991638532096, adjusting msgr requires for >> > osds >> > > -5> 2018-11-26 18:02:34.143084 7fa1c33f9700 3 rocksdb: >> > > >> > [/build/ceph-12.2.9/src/rocksdb/db/db_impl_compaction_flush.cc:1591] >> > > Compaction error: Corruption: block checksum mismatch >> > > -4> 2018-11-26 18:02:34.143123 7fa1c33f9700 4 rocksdb: >> > (Original Log >> > > Time 2018/11/26-18:02:34.143021) >> > > [/build/ceph-12.2.9/src/rocksdb/db/compaction_job.cc:621] >> > [default] >> > > compacted to: base level 1 max bytes base268435456 files[17$ >> > > >> > > -3> 2018-11-26 18:02:34.143126 7fa1c33f9700 4 rocksdb: >> > (Original Log >> > > Time 2018/11/26-18:02:34.143068) EVENT_LOG_v1 {"time_micros": >> > > 1543251754143044, "job": 3, "event": "compaction_finished", >> > > "compaction_time_micros": 1997048, "out$ >> > > -2> 2018-11-26 18:02:34.143152 7fa1c33f9700 2 rocksdb: >> > > >> > [/build/ceph-12.2.9/src/rocksdb/db/db_impl_compaction_flush.cc:1275] >> > > Waiting after background compaction error: Corruption: block >> > > checksum mismatch, Accumulated background err$ >> > > -1> 2018-11-26 18:02:34.674171 7fa1c4bfc700 -1 rocksdb: >> > > submit_transaction error: Corruption: block checksum mismatch >> > code = >> > > 2 Rocksdb transaction: >> > > Delete( Prefix = O key = >> > > >> > 0x7f7ffffffffffffffb64000000217363'rub_3.26!='0xfffffffffffffffeffffffffffffffff'o') >> > > Put( Prefix = S key = 'nid_max' Value size = 8) >> > > Put( Prefix = S key = 'blobid_max' Value size = 8) >> > > 0> 2018-11-26 18:02:34.675641 7fa1c4bfc700 -1 >> > > /build/ceph-12.2.9/src/os/bluestore/BlueStore.cc: In function >> > 'void >> > > BlueStore::_kv_sync_thread()' thread 7fa1c4bfc700 time 2018-11-26 >> > > 18:02:34.674193 >> > > /build/ceph-12.2.9/src/os/bluestore/BlueStore.cc: 8717: FAILED >> > > assert(r == 0) >> > > >> > > ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) >> > > luminous (stable) >> > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> > > const*)+0x102) [0x55ec83876092] >> > > 2: (BlueStore::_kv_sync_thread()+0x24b5) [0x55ec836ffb55] >> > > 3: (BlueStore::KVSyncThread::entry()+0xd) [0x55ec8374040d] >> > > 4: (()+0x7494) [0x7fa1d5027494] >> > > 5: (clone()+0x3f) [0x7fa1d4098acf] >> > > >> > > >> > > I have tried to recover it using ceph-bluestore-tool fsck and repair >> > > DEEP but it says it is ALL ok. >> > > I see that rocksd ldb tool needs .db files to recover and not a >> > > partition so I cannot use it. >> > > I do not understand why I cannot start osd if ceph-bluestore-tools >> > says >> > > me I have lost no data. >> > > Can you help me? >> > >> > Why would you try to recover a individual OSD? If all your Placement >> > Groups are active(+clean) just wipe the OSD and re-deploy it. >> > >> > What's the status of your PGs? >> > >> > It says there is a checksum error (probably due to the hardware glitch) >> > so it refuses to start. >> > >> > Don't try to outsmart Ceph, let backfill/recovery handle this. Trying to >> > manually fix this will only make things worse. >> > >> > Wido >> > >> > > Thanks, >> > > Mario >> > > >> > > _______________________________________________ >> > > ceph-users mailing list >> > > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> >> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > >> > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com