'ceph-bluestore-tool repair' checks and repairs BlueStore
metadata consistency not RocksDB one.
It looks like you're observing CRC mismatch during DB compaction
which is probably not triggered during the repair.
Good point is that it looks like Bluestore's metadata are
consistent and hence data recovery is still possible -
potentially, can't build up a working procedure using existing
tools though..
Let me check if one can disable DB compaction using rocksdb
settings.
On 11/29/2018 1:42 PM, Mario Giammarco
wrote:
The only strange thing is that ceph-bluestore-tool
says that repair was done, no errors are found and all is ok.
I ask myself what really does that tool.
Mario
Il giorno gio 29 nov 2018 alle ore 11:03 Wido den
Hollander < wido@xxxxxxxx> ha scritto:
On 11/29/18 10:45 AM, Mario Giammarco wrote:
> I have only that copy, it is a showroom system but
someone put a
> production vm on it.
>
I have a feeling this won't be easy to fix or actually
fixable:
- Compaction error: Corruption: block checksum mismatch
- submit_transaction error: Corruption: block checksum
mismatch
RocksDB got corrupted on that OSD and won't be able to start
now.
I wouldn't know where to start with this OSD.
Wido
> Il giorno gio 29 nov 2018 alle ore 10:43 Wido den
Hollander
> <wido@xxxxxxxx <mailto:wido@xxxxxxxx>> ha scritto:
>
>
>
> On 11/29/18 10:28 AM, Mario Giammarco wrote:
> > Hello,
> > I have a ceph installation in a proxmox cluster.
> > Due to a temporary hardware glitch now I get
this error on osd startup
> >
> > -6> 2018-11-26 18:02:33.179327
7fa1d784be00 0 osd.0 1033
> crush map
> > has features 1009089991638532096, adjusting
msgr requires for
> osds
> > -5> 2018-11-26 18:02:34.143084
7fa1c33f9700 3 rocksdb:
> >
>
[/build/ceph-12.2.9/src/rocksdb/db/db_impl_compaction_flush.cc:1591]
> > Compaction error: Corruption: block checksum
mismatch
> > -4> 2018-11-26 18:02:34.143123
7fa1c33f9700 4 rocksdb:
> (Original Log
> > Time 2018/11/26-18:02:34.143021)
> >
[/build/ceph-12.2.9/src/rocksdb/db/compaction_job.cc:621]
> [default]
> > compacted to: base level 1 max bytes
base268435456 files[17$
> >
> > -3> 2018-11-26 18:02:34.143126
7fa1c33f9700 4 rocksdb:
> (Original Log
> > Time 2018/11/26-18:02:34.143068)
EVENT_LOG_v1 {"time_micros":
> > 1543251754143044, "job": 3, "event":
"compaction_finished",
> > "compaction_time_micros": 1997048, "out$
> > -2> 2018-11-26 18:02:34.143152
7fa1c33f9700 2 rocksdb:
> >
>
[/build/ceph-12.2.9/src/rocksdb/db/db_impl_compaction_flush.cc:1275]
> > Waiting after background compaction error:
Corruption: block
> > checksum mismatch, Accumulated background
err$
> > -1> 2018-11-26 18:02:34.674171
7fa1c4bfc700 -1 rocksdb:
> > submit_transaction error: Corruption: block
checksum mismatch
> code =
> > 2 Rocksdb transaction:
> > Delete( Prefix = O key =
> >
>
0x7f7ffffffffffffffb64000000217363'rub_3.26!='0xfffffffffffffffeffffffffffffffff'o')
> > Put( Prefix = S key = 'nid_max' Value size =
8)
> > Put( Prefix = S key = 'blobid_max' Value
size = 8)
> > 0> 2018-11-26 18:02:34.675641
7fa1c4bfc700 -1
> >
/build/ceph-12.2.9/src/os/bluestore/BlueStore.cc: In function
> 'void
> > BlueStore::_kv_sync_thread()' thread
7fa1c4bfc700 time 2018-11-26
> > 18:02:34.674193
> >
/build/ceph-12.2.9/src/os/bluestore/BlueStore.cc: 8717:
FAILED
> > assert(r == 0)
> >
> > ceph version 12.2.9
(9e300932ef8a8916fb3fda78c58691a6ab0f4217)
> > luminous (stable)
> > 1: (ceph::__ceph_assert_fail(char const*,
char const*, int, char
> > const*)+0x102) [0x55ec83876092]
> > 2: (BlueStore::_kv_sync_thread()+0x24b5)
[0x55ec836ffb55]
> > 3: (BlueStore::KVSyncThread::entry()+0xd)
[0x55ec8374040d]
> > 4: (()+0x7494) [0x7fa1d5027494]
> > 5: (clone()+0x3f) [0x7fa1d4098acf]
> >
> >
> > I have tried to recover it using
ceph-bluestore-tool fsck and repair
> > DEEP but it says it is ALL ok.
> > I see that rocksd ldb tool needs .db files to
recover and not a
> > partition so I cannot use it.
> > I do not understand why I cannot start osd if
ceph-bluestore-tools
> says
> > me I have lost no data.
> > Can you help me?
>
> Why would you try to recover a individual OSD? If all
your Placement
> Groups are active(+clean) just wipe the OSD and
re-deploy it.
>
> What's the status of your PGs?
>
> It says there is a checksum error (probably due to
the hardware glitch)
> so it refuses to start.
>
> Don't try to outsmart Ceph, let backfill/recovery
handle this. Trying to
> manually fix this will only make things worse.
>
> Wido
>
> > Thanks,
> > Mario
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
<mailto:ceph-users@xxxxxxxxxxxxxx>
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com