On Thu, 13 Jun 2019, Simon Leinen wrote: > Sage Weil writes: > >> 2019-06-12 23:40:43.555 7f724b27f0c0 1 rocksdb: do_open column families: [default] > >> Unrecognized command: stats > >> ceph-kvstore-tool: /build/ceph-14.2.1/src/rocksdb/db/version_set.cc:356: rocksdb::Version::~Version(): Assertion `path_id < cfd_->ioptions()->cf_paths.size()' failed. > >> *** Caught signal (Aborted) ** > > > Ah, this looks promising.. it looks like it got it open and has some > > problem with teh error/teardown path. > > > Try 'compact' instead of 'stats'? > > That run for a while and then crashed, also in the destructor for > rocksdb::Version, but with an otherwise different backtrace. I'm > attaching the log again. Hmm, I'm pretty sure this is a shutdown problem, but not certain. If you do ceph-kvstore-tool rocksdb /mnt/ceph/db list > keys is the keys file huge? Can you send the head and tail of it so we can make sure it looks complete? One last thing to check: ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-NNN list > keys and see if that behaves similarly or crashes in the way it did before when the OSD was starting. If the exported version looks intact, I have a workaround that will make the osd use that external rocksdb db instead of the embedded one... basically, - symlink the db, db.wal, db.slow files from the osd dir (/var/lib/ceph/osd/ceph-NNN/db -> ... etc) - ceph-bluestore-tool --dev /var/lib/ceph/osd/ceph-NNN/block set-label-key -k bluefs -v 0 - start osd but be warned this is fragile: there isn't a bluefs import function, so this OSD will be permanently in that weird state. The goal will be to get it up and the PG/cluster behaving, and then eventually let rados recover elsewhere and reprovision this osd. But first, let's make sure the external rocksdb has a complete set of keys! sage _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com