On Wed, 12 Jun 2019, Simon Leinen wrote: > Sage Weil writes: > > What happens if you do > > > ceph-kvstore-tool rocksdb /mnt/ceph/db stats > > (I'm afraid that our ceph-kvstore-tool doesn't know about a "stats" > command; but it still tries to open the database.) > > That aborts after complaining about many missing files in /mnt/ceph/db. > > When I ( cd /mnt/ceph/db && sudo ln -s ../db.slow/* . ) and re-run, > it still aborts, just without complaining about missing files. Ah, yes--I forgot that part :) > I'm attaching the output (stdout+stderr combined), in case that helps. > > > or, if htat works, > > > ceph-kvstore-tool rocksdb /mnt/ceph/db compact > > > It looks like bluefs is happy (in that it can read the whole set > > of rocksdb files), so the questoin is if rocksdb can open them, or > > if there's some corruption or problem at the rocksdb level. > > > The original crash is actually here: > > > ... > > 9: (tc_new()+0x283) [0x7fbdbed8e943] > > 10: (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_mutate(unsigned long, unsigned long, char const*, unsigned long)+0x69) [0x5600b1268109] > > 11: (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)+0x63) [0x5600b12f5b43] > > 12: (rocksdb::BlockBuilder::Add(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Slice const*)+0x10b) [0x5600b1eaca9b] > > ... > > > where tc_new is (I think) tcmalloc. Which looks to me like rocksdb > > is probably trying to allocate something very big. The question is will > > that happen with the exported files or only on bluefs... > > Yes, that's what I was thinking as well. The server seems to have about > 50GB of free RAM though, so maybe it was more like <UNDEFINED>ly big :-) > > Also, your ceph-kvstore-tool command seems to have crashed somewhere > else (the desctructor of a rocksdb::Version object?) > > 2019-06-12 23:40:43.555 7f724b27f0c0 1 rocksdb: do_open column families: [default] > Unrecognized command: stats > ceph-kvstore-tool: /build/ceph-14.2.1/src/rocksdb/db/version_set.cc:356: rocksdb::Version::~Version(): Assertion `path_id < cfd_->ioptions()->cf_paths.size()' failed. > *** Caught signal (Aborted) ** Ah, this looks promising.. it looks like it got it open and has some problem with teh error/teardown path. Try 'compact' instead of 'stats'? sage > in thread 7f724b27f0c0 thread_name:ceph-kvstore-to > ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable) > 1: (()+0x12890) [0x7f7240c6f890] > 2: (gsignal()+0xc7) [0x7f723fb5fe97] > 3: (abort()+0x141) [0x7f723fb61801] > 4: (()+0x3039a) [0x7f723fb5139a] > 5: (()+0x30412) [0x7f723fb51412] > 6: (rocksdb::Version::~Version()+0x224) [0x559749529fe4] > 7: (rocksdb::Version::Unref()+0x35) [0x55974952a065] > 8: (rocksdb::SuperVersion::Cleanup()+0x68) [0x55974960f328] > 9: (rocksdb::ColumnFamilyData::~ColumnFamilyData()+0xf4) [0x5597496123d4] > 10: (rocksdb::ColumnFamilySet::~ColumnFamilySet()+0xb8) [0x559749612ba8] > 11: (rocksdb::VersionSet::~VersionSet()+0x4d) [0x55974951da5d] > 12: (rocksdb::DBImpl::CloseHelper()+0x6a8) [0x55974944a868] > 13: (rocksdb::DBImpl::~DBImpl()+0x65b) [0x559749455deb] > 14: (rocksdb::DBImpl::~DBImpl()+0x11) [0x559749455e21] > 15: (RocksDBStore::~RocksDBStore()+0xe9) [0x559749265349] > 16: (RocksDBStore::~RocksDBStore()+0x9) [0x559749265599] > 17: (main()+0x307) [0x5597490b5fb7] > 18: (__libc_start_main()+0xe7) [0x7f723fb42b97] > 19: (_start()+0x2a) [0x55974918e03a] > 2019-06-12 23:40:51.363 7f724b27f0c0 -1 *** Caught signal (Aborted) ** > in thread 7f724b27f0c0 thread_name:ceph-kvstore-to > > > Thanks! > > Thanks so much for looking into this! > > We hope that we can get some access to S3 bucket indexes back, possibly > by somehow dropping and re-creating those indexes. > -- > Simon. > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com