Sage Weil writes: > What happens if you do > ceph-kvstore-tool rocksdb /mnt/ceph/db stats (I'm afraid that our ceph-kvstore-tool doesn't know about a "stats" command; but it still tries to open the database.) That aborts after complaining about many missing files in /mnt/ceph/db. When I ( cd /mnt/ceph/db && sudo ln -s ../db.slow/* . ) and re-run, it still aborts, just without complaining about missing files. I'm attaching the output (stdout+stderr combined), in case that helps. > or, if htat works, > ceph-kvstore-tool rocksdb /mnt/ceph/db compact > It looks like bluefs is happy (in that it can read the whole set > of rocksdb files), so the questoin is if rocksdb can open them, or > if there's some corruption or problem at the rocksdb level. > The original crash is actually here: > ... > 9: (tc_new()+0x283) [0x7fbdbed8e943] > 10: (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_mutate(unsigned long, unsigned long, char const*, unsigned long)+0x69) [0x5600b1268109] > 11: (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)+0x63) [0x5600b12f5b43] > 12: (rocksdb::BlockBuilder::Add(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Slice const*)+0x10b) [0x5600b1eaca9b] > ... > where tc_new is (I think) tcmalloc. Which looks to me like rocksdb > is probably trying to allocate something very big. The question is will > that happen with the exported files or only on bluefs... Yes, that's what I was thinking as well. The server seems to have about 50GB of free RAM though, so maybe it was more like <UNDEFINED>ly big :-) Also, your ceph-kvstore-tool command seems to have crashed somewhere else (the desctructor of a rocksdb::Version object?) 2019-06-12 23:40:43.555 7f724b27f0c0 1 rocksdb: do_open column families: [default] Unrecognized command: stats ceph-kvstore-tool: /build/ceph-14.2.1/src/rocksdb/db/version_set.cc:356: rocksdb::Version::~Version(): Assertion `path_id < cfd_->ioptions()->cf_paths.size()' failed. *** Caught signal (Aborted) ** in thread 7f724b27f0c0 thread_name:ceph-kvstore-to ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable) 1: (()+0x12890) [0x7f7240c6f890] 2: (gsignal()+0xc7) [0x7f723fb5fe97] 3: (abort()+0x141) [0x7f723fb61801] 4: (()+0x3039a) [0x7f723fb5139a] 5: (()+0x30412) [0x7f723fb51412] 6: (rocksdb::Version::~Version()+0x224) [0x559749529fe4] 7: (rocksdb::Version::Unref()+0x35) [0x55974952a065] 8: (rocksdb::SuperVersion::Cleanup()+0x68) [0x55974960f328] 9: (rocksdb::ColumnFamilyData::~ColumnFamilyData()+0xf4) [0x5597496123d4] 10: (rocksdb::ColumnFamilySet::~ColumnFamilySet()+0xb8) [0x559749612ba8] 11: (rocksdb::VersionSet::~VersionSet()+0x4d) [0x55974951da5d] 12: (rocksdb::DBImpl::CloseHelper()+0x6a8) [0x55974944a868] 13: (rocksdb::DBImpl::~DBImpl()+0x65b) [0x559749455deb] 14: (rocksdb::DBImpl::~DBImpl()+0x11) [0x559749455e21] 15: (RocksDBStore::~RocksDBStore()+0xe9) [0x559749265349] 16: (RocksDBStore::~RocksDBStore()+0x9) [0x559749265599] 17: (main()+0x307) [0x5597490b5fb7] 18: (__libc_start_main()+0xe7) [0x7f723fb42b97] 19: (_start()+0x2a) [0x55974918e03a] 2019-06-12 23:40:51.363 7f724b27f0c0 -1 *** Caught signal (Aborted) ** in thread 7f724b27f0c0 thread_name:ceph-kvstore-to > Thanks! Thanks so much for looking into this! We hope that we can get some access to S3 bucket indexes back, possibly by somehow dropping and re-creating those indexes. -- Simon.
2019-06-12 23:40:43.555 7f724b27f0c0 1 rocksdb: do_open column families: [default] Unrecognized command: stats ceph-kvstore-tool: /build/ceph-14.2.1/src/rocksdb/db/version_set.cc:356: rocksdb::Version::~Version(): Assertion `path_id < cfd_->ioptions()->cf_paths.size()' failed. *** Caught signal (Aborted) ** in thread 7f724b27f0c0 thread_name:ceph-kvstore-to ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable) 1: (()+0x12890) [0x7f7240c6f890] 2: (gsignal()+0xc7) [0x7f723fb5fe97] 3: (abort()+0x141) [0x7f723fb61801] 4: (()+0x3039a) [0x7f723fb5139a] 5: (()+0x30412) [0x7f723fb51412] 6: (rocksdb::Version::~Version()+0x224) [0x559749529fe4] 7: (rocksdb::Version::Unref()+0x35) [0x55974952a065] 8: (rocksdb::SuperVersion::Cleanup()+0x68) [0x55974960f328] 9: (rocksdb::ColumnFamilyData::~ColumnFamilyData()+0xf4) [0x5597496123d4] 10: (rocksdb::ColumnFamilySet::~ColumnFamilySet()+0xb8) [0x559749612ba8] 11: (rocksdb::VersionSet::~VersionSet()+0x4d) [0x55974951da5d] 12: (rocksdb::DBImpl::CloseHelper()+0x6a8) [0x55974944a868] 13: (rocksdb::DBImpl::~DBImpl()+0x65b) [0x559749455deb] 14: (rocksdb::DBImpl::~DBImpl()+0x11) [0x559749455e21] 15: (RocksDBStore::~RocksDBStore()+0xe9) [0x559749265349] 16: (RocksDBStore::~RocksDBStore()+0x9) [0x559749265599] 17: (main()+0x307) [0x5597490b5fb7] 18: (__libc_start_main()+0xe7) [0x7f723fb42b97] 19: (_start()+0x2a) [0x55974918e03a] 2019-06-12 23:40:51.363 7f724b27f0c0 -1 *** Caught signal (Aborted) ** in thread 7f724b27f0c0 thread_name:ceph-kvstore-to ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable) 1: (()+0x12890) [0x7f7240c6f890] 2: (gsignal()+0xc7) [0x7f723fb5fe97] 3: (abort()+0x141) [0x7f723fb61801] 4: (()+0x3039a) [0x7f723fb5139a] 5: (()+0x30412) [0x7f723fb51412] 6: (rocksdb::Version::~Version()+0x224) [0x559749529fe4] 7: (rocksdb::Version::Unref()+0x35) [0x55974952a065] 8: (rocksdb::SuperVersion::Cleanup()+0x68) [0x55974960f328] 9: (rocksdb::ColumnFamilyData::~ColumnFamilyData()+0xf4) [0x5597496123d4] 10: (rocksdb::ColumnFamilySet::~ColumnFamilySet()+0xb8) [0x559749612ba8] 11: (rocksdb::VersionSet::~VersionSet()+0x4d) [0x55974951da5d] 12: (rocksdb::DBImpl::CloseHelper()+0x6a8) [0x55974944a868] 13: (rocksdb::DBImpl::~DBImpl()+0x65b) [0x559749455deb] 14: (rocksdb::DBImpl::~DBImpl()+0x11) [0x559749455e21] 15: (RocksDBStore::~RocksDBStore()+0xe9) [0x559749265349] 16: (RocksDBStore::~RocksDBStore()+0x9) [0x559749265599] 17: (main()+0x307) [0x5597490b5fb7] 18: (__libc_start_main()+0xe7) [0x7f723fb42b97] 19: (_start()+0x2a) [0x55974918e03a] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -23> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command assert hook 0x55974ac02130 -22> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command abort hook 0x55974ac02130 -21> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command perfcounters_dump hook 0x55974ac02130 -20> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command 1 hook 0x55974ac02130 -19> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command perf dump hook 0x55974ac02130 -18> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command perfcounters_schema hook 0x55974ac02130 -17> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command perf histogram dump hook 0x55974ac02130 -16> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command 2 hook 0x55974ac02130 -15> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command perf schema hook 0x55974ac02130 -14> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command perf histogram schema hook 0x55974ac02130 -13> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command perf reset hook 0x55974ac02130 -12> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command config show hook 0x55974ac02130 -11> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command config help hook 0x55974ac02130 -10> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command config set hook 0x55974ac02130 -9> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command config unset hook 0x55974ac02130 -8> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command config get hook 0x55974ac02130 -7> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command config diff hook 0x55974ac02130 -6> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command config diff get hook 0x55974ac02130 -5> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command log flush hook 0x55974ac02130 -4> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command log dump hook 0x55974ac02130 -3> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command log reopen hook 0x55974ac02130 -2> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command dump_mempools hook 0x55974ba7c068 -1> 2019-06-12 23:40:43.555 7f724b27f0c0 1 rocksdb: do_open column families: [default] 0> 2019-06-12 23:40:51.363 7f724b27f0c0 -1 *** Caught signal (Aborted) ** in thread 7f724b27f0c0 thread_name:ceph-kvstore-to ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable) 1: (()+0x12890) [0x7f7240c6f890] 2: (gsignal()+0xc7) [0x7f723fb5fe97] 3: (abort()+0x141) [0x7f723fb61801] 4: (()+0x3039a) [0x7f723fb5139a] 5: (()+0x30412) [0x7f723fb51412] 6: (rocksdb::Version::~Version()+0x224) [0x559749529fe4] 7: (rocksdb::Version::Unref()+0x35) [0x55974952a065] 8: (rocksdb::SuperVersion::Cleanup()+0x68) [0x55974960f328] 9: (rocksdb::ColumnFamilyData::~ColumnFamilyData()+0xf4) [0x5597496123d4] 10: (rocksdb::ColumnFamilySet::~ColumnFamilySet()+0xb8) [0x559749612ba8] 11: (rocksdb::VersionSet::~VersionSet()+0x4d) [0x55974951da5d] 12: (rocksdb::DBImpl::CloseHelper()+0x6a8) [0x55974944a868] 13: (rocksdb::DBImpl::~DBImpl()+0x65b) [0x559749455deb] 14: (rocksdb::DBImpl::~DBImpl()+0x11) [0x559749455e21] 15: (RocksDBStore::~RocksDBStore()+0xe9) [0x559749265349] 16: (RocksDBStore::~RocksDBStore()+0x9) [0x559749265599] 17: (main()+0x307) [0x5597490b5fb7] 18: (__libc_start_main()+0xe7) [0x7f723fb42b97] 19: (_start()+0x2a) [0x55974918e03a] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 0 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 1 reserver 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 rgw_sync 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 2/ 2 rocksdb 4/ 5 leveldb 4/ 5 memdb 1/ 5 kinetic 1/ 5 fuse 1/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace -2/-2 (syslog threshold) 99/99 (stderr threshold) max_recent 500 max_new 1000 log_file --- end dump of recent events --- --- begin dump of recent events --- -23> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command assert hook 0x55974ac02130 -22> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command abort hook 0x55974ac02130 -21> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command perfcounters_dump hook 0x55974ac02130 -20> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command 1 hook 0x55974ac02130 -19> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command perf dump hook 0x55974ac02130 -18> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command perfcounters_schema hook 0x55974ac02130 -17> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command perf histogram dump hook 0x55974ac02130 -16> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command 2 hook 0x55974ac02130 -15> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command perf schema hook 0x55974ac02130 -14> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command perf histogram schema hook 0x55974ac02130 -13> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command perf reset hook 0x55974ac02130 -12> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command config show hook 0x55974ac02130 -11> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command config help hook 0x55974ac02130 -10> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command config set hook 0x55974ac02130 -9> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command config unset hook 0x55974ac02130 -8> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command config get hook 0x55974ac02130 -7> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command config diff hook 0x55974ac02130 -6> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command config diff get hook 0x55974ac02130 -5> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command log flush hook 0x55974ac02130 -4> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command log dump hook 0x55974ac02130 -3> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command log reopen hook 0x55974ac02130 -2> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000) register_command dump_mempools hook 0x55974ba7c068 -1> 2019-06-12 23:40:43.555 7f724b27f0c0 1 rocksdb: do_open column families: [default] 0> 2019-06-12 23:40:51.363 7f724b27f0c0 -1 *** Caught signal (Aborted) ** in thread 7f724b27f0c0 thread_name:ceph-kvstore-to ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable) 1: (()+0x12890) [0x7f7240c6f890] 2: (gsignal()+0xc7) [0x7f723fb5fe97] 3: (abort()+0x141) [0x7f723fb61801] 4: (()+0x3039a) [0x7f723fb5139a] 5: (()+0x30412) [0x7f723fb51412] 6: (rocksdb::Version::~Version()+0x224) [0x559749529fe4] 7: (rocksdb::Version::Unref()+0x35) [0x55974952a065] 8: (rocksdb::SuperVersion::Cleanup()+0x68) [0x55974960f328] 9: (rocksdb::ColumnFamilyData::~ColumnFamilyData()+0xf4) [0x5597496123d4] 10: (rocksdb::ColumnFamilySet::~ColumnFamilySet()+0xb8) [0x559749612ba8] 11: (rocksdb::VersionSet::~VersionSet()+0x4d) [0x55974951da5d] 12: (rocksdb::DBImpl::CloseHelper()+0x6a8) [0x55974944a868] 13: (rocksdb::DBImpl::~DBImpl()+0x65b) [0x559749455deb] 14: (rocksdb::DBImpl::~DBImpl()+0x11) [0x559749455e21] 15: (RocksDBStore::~RocksDBStore()+0xe9) [0x559749265349] 16: (RocksDBStore::~RocksDBStore()+0x9) [0x559749265599] 17: (main()+0x307) [0x5597490b5fb7] 18: (__libc_start_main()+0xe7) [0x7f723fb42b97] 19: (_start()+0x2a) [0x55974918e03a] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 0 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 1 reserver 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 rgw_sync 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 2/ 2 rocksdb 4/ 5 leveldb 4/ 5 memdb 1/ 5 kinetic 1/ 5 fuse 1/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace -2/-2 (syslog threshold) 99/99 (stderr threshold) max_recent 500 max_new 1000 log_file /var/lib/ceph/crash/2019-06-12_21:40:51.369265Z_0eea9b49-ec97-4654-aee5-89d9207df79a/log --- end dump of recent events ---
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com