Re: rocksdb corruption, stale pg, rebuild bucket index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sage Weil writes:
> What happens if you do

>  ceph-kvstore-tool rocksdb /mnt/ceph/db stats

(I'm afraid that our ceph-kvstore-tool doesn't know about a "stats"
command; but it still tries to open the database.)

That aborts after complaining about many missing files in /mnt/ceph/db.

When I ( cd /mnt/ceph/db && sudo ln -s ../db.slow/* . ) and re-run,
it still aborts, just without complaining about missing files.

I'm attaching the output (stdout+stderr combined), in case that helps.

> or, if htat works,

>  ceph-kvstore-tool rocksdb /mnt/ceph/db compact

> It looks like bluefs is happy (in that it can read the whole set 
> of rocksdb files), so the questoin is if rocksdb can open them, or 
> if there's some corruption or problem at the rocksdb level.

> The original crash is actually here:

>  ...
>  9: (tc_new()+0x283) [0x7fbdbed8e943]
>  10: (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_mutate(unsigned long, unsigned long, char const*, unsigned long)+0x69) [0x5600b1268109]
>  11: (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)+0x63) [0x5600b12f5b43]
>  12: (rocksdb::BlockBuilder::Add(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Slice const*)+0x10b) [0x5600b1eaca9b]
>  ...

> where tc_new is (I think) tcmalloc.  Which looks to me like rocksdb 
> is probably trying to allocate something very big.  The question is will 
> that happen with the exported files or only on bluefs...

Yes, that's what I was thinking as well.  The server seems to have about
50GB of free RAM though, so maybe it was more like <UNDEFINED>ly big :-)

Also, your ceph-kvstore-tool command seems to have crashed somewhere
else (the desctructor of a rocksdb::Version object?)

  2019-06-12 23:40:43.555 7f724b27f0c0  1 rocksdb: do_open column families: [default]
  Unrecognized command: stats
  ceph-kvstore-tool: /build/ceph-14.2.1/src/rocksdb/db/version_set.cc:356: rocksdb::Version::~Version(): Assertion `path_id < cfd_->ioptions()->cf_paths.size()' failed.
  *** Caught signal (Aborted) **
   in thread 7f724b27f0c0 thread_name:ceph-kvstore-to
   ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)
   1: (()+0x12890) [0x7f7240c6f890]
   2: (gsignal()+0xc7) [0x7f723fb5fe97]
   3: (abort()+0x141) [0x7f723fb61801]
   4: (()+0x3039a) [0x7f723fb5139a]
   5: (()+0x30412) [0x7f723fb51412]
   6: (rocksdb::Version::~Version()+0x224) [0x559749529fe4]
   7: (rocksdb::Version::Unref()+0x35) [0x55974952a065]
   8: (rocksdb::SuperVersion::Cleanup()+0x68) [0x55974960f328]
   9: (rocksdb::ColumnFamilyData::~ColumnFamilyData()+0xf4) [0x5597496123d4]
   10: (rocksdb::ColumnFamilySet::~ColumnFamilySet()+0xb8) [0x559749612ba8]
   11: (rocksdb::VersionSet::~VersionSet()+0x4d) [0x55974951da5d]
   12: (rocksdb::DBImpl::CloseHelper()+0x6a8) [0x55974944a868]
   13: (rocksdb::DBImpl::~DBImpl()+0x65b) [0x559749455deb]
   14: (rocksdb::DBImpl::~DBImpl()+0x11) [0x559749455e21]
   15: (RocksDBStore::~RocksDBStore()+0xe9) [0x559749265349]
   16: (RocksDBStore::~RocksDBStore()+0x9) [0x559749265599]
   17: (main()+0x307) [0x5597490b5fb7]
   18: (__libc_start_main()+0xe7) [0x7f723fb42b97]
   19: (_start()+0x2a) [0x55974918e03a]
  2019-06-12 23:40:51.363 7f724b27f0c0 -1 *** Caught signal (Aborted) **
   in thread 7f724b27f0c0 thread_name:ceph-kvstore-to

> Thanks!

Thanks so much for looking into this!

We hope that we can get some access to S3 bucket indexes back, possibly
by somehow dropping and re-creating those indexes.
-- 
Simon.

2019-06-12 23:40:43.555 7f724b27f0c0  1 rocksdb: do_open column families: [default]
Unrecognized command: stats
ceph-kvstore-tool: /build/ceph-14.2.1/src/rocksdb/db/version_set.cc:356: rocksdb::Version::~Version(): Assertion `path_id < cfd_->ioptions()->cf_paths.size()' failed.
*** Caught signal (Aborted) **
 in thread 7f724b27f0c0 thread_name:ceph-kvstore-to
 ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)
 1: (()+0x12890) [0x7f7240c6f890]
 2: (gsignal()+0xc7) [0x7f723fb5fe97]
 3: (abort()+0x141) [0x7f723fb61801]
 4: (()+0x3039a) [0x7f723fb5139a]
 5: (()+0x30412) [0x7f723fb51412]
 6: (rocksdb::Version::~Version()+0x224) [0x559749529fe4]
 7: (rocksdb::Version::Unref()+0x35) [0x55974952a065]
 8: (rocksdb::SuperVersion::Cleanup()+0x68) [0x55974960f328]
 9: (rocksdb::ColumnFamilyData::~ColumnFamilyData()+0xf4) [0x5597496123d4]
 10: (rocksdb::ColumnFamilySet::~ColumnFamilySet()+0xb8) [0x559749612ba8]
 11: (rocksdb::VersionSet::~VersionSet()+0x4d) [0x55974951da5d]
 12: (rocksdb::DBImpl::CloseHelper()+0x6a8) [0x55974944a868]
 13: (rocksdb::DBImpl::~DBImpl()+0x65b) [0x559749455deb]
 14: (rocksdb::DBImpl::~DBImpl()+0x11) [0x559749455e21]
 15: (RocksDBStore::~RocksDBStore()+0xe9) [0x559749265349]
 16: (RocksDBStore::~RocksDBStore()+0x9) [0x559749265599]
 17: (main()+0x307) [0x5597490b5fb7]
 18: (__libc_start_main()+0xe7) [0x7f723fb42b97]
 19: (_start()+0x2a) [0x55974918e03a]
2019-06-12 23:40:51.363 7f724b27f0c0 -1 *** Caught signal (Aborted) **
 in thread 7f724b27f0c0 thread_name:ceph-kvstore-to

 ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)
 1: (()+0x12890) [0x7f7240c6f890]
 2: (gsignal()+0xc7) [0x7f723fb5fe97]
 3: (abort()+0x141) [0x7f723fb61801]
 4: (()+0x3039a) [0x7f723fb5139a]
 5: (()+0x30412) [0x7f723fb51412]
 6: (rocksdb::Version::~Version()+0x224) [0x559749529fe4]
 7: (rocksdb::Version::Unref()+0x35) [0x55974952a065]
 8: (rocksdb::SuperVersion::Cleanup()+0x68) [0x55974960f328]
 9: (rocksdb::ColumnFamilyData::~ColumnFamilyData()+0xf4) [0x5597496123d4]
 10: (rocksdb::ColumnFamilySet::~ColumnFamilySet()+0xb8) [0x559749612ba8]
 11: (rocksdb::VersionSet::~VersionSet()+0x4d) [0x55974951da5d]
 12: (rocksdb::DBImpl::CloseHelper()+0x6a8) [0x55974944a868]
 13: (rocksdb::DBImpl::~DBImpl()+0x65b) [0x559749455deb]
 14: (rocksdb::DBImpl::~DBImpl()+0x11) [0x559749455e21]
 15: (RocksDBStore::~RocksDBStore()+0xe9) [0x559749265349]
 16: (RocksDBStore::~RocksDBStore()+0x9) [0x559749265599]
 17: (main()+0x307) [0x5597490b5fb7]
 18: (__libc_start_main()+0xe7) [0x7f723fb42b97]
 19: (_start()+0x2a) [0x55974918e03a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
   -23> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command assert hook 0x55974ac02130
   -22> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command abort hook 0x55974ac02130
   -21> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command perfcounters_dump hook 0x55974ac02130
   -20> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command 1 hook 0x55974ac02130
   -19> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command perf dump hook 0x55974ac02130
   -18> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command perfcounters_schema hook 0x55974ac02130
   -17> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command perf histogram dump hook 0x55974ac02130
   -16> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command 2 hook 0x55974ac02130
   -15> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command perf schema hook 0x55974ac02130
   -14> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command perf histogram schema hook 0x55974ac02130
   -13> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command perf reset hook 0x55974ac02130
   -12> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command config show hook 0x55974ac02130
   -11> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command config help hook 0x55974ac02130
   -10> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command config set hook 0x55974ac02130
    -9> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command config unset hook 0x55974ac02130
    -8> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command config get hook 0x55974ac02130
    -7> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command config diff hook 0x55974ac02130
    -6> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command config diff get hook 0x55974ac02130
    -5> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command log flush hook 0x55974ac02130
    -4> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command log dump hook 0x55974ac02130
    -3> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command log reopen hook 0x55974ac02130
    -2> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command dump_mempools hook 0x55974ba7c068
    -1> 2019-06-12 23:40:43.555 7f724b27f0c0  1 rocksdb: do_open column families: [default]
     0> 2019-06-12 23:40:51.363 7f724b27f0c0 -1 *** Caught signal (Aborted) **
 in thread 7f724b27f0c0 thread_name:ceph-kvstore-to

 ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)
 1: (()+0x12890) [0x7f7240c6f890]
 2: (gsignal()+0xc7) [0x7f723fb5fe97]
 3: (abort()+0x141) [0x7f723fb61801]
 4: (()+0x3039a) [0x7f723fb5139a]
 5: (()+0x30412) [0x7f723fb51412]
 6: (rocksdb::Version::~Version()+0x224) [0x559749529fe4]
 7: (rocksdb::Version::Unref()+0x35) [0x55974952a065]
 8: (rocksdb::SuperVersion::Cleanup()+0x68) [0x55974960f328]
 9: (rocksdb::ColumnFamilyData::~ColumnFamilyData()+0xf4) [0x5597496123d4]
 10: (rocksdb::ColumnFamilySet::~ColumnFamilySet()+0xb8) [0x559749612ba8]
 11: (rocksdb::VersionSet::~VersionSet()+0x4d) [0x55974951da5d]
 12: (rocksdb::DBImpl::CloseHelper()+0x6a8) [0x55974944a868]
 13: (rocksdb::DBImpl::~DBImpl()+0x65b) [0x559749455deb]
 14: (rocksdb::DBImpl::~DBImpl()+0x11) [0x559749455e21]
 15: (RocksDBStore::~RocksDBStore()+0xe9) [0x559749265349]
 16: (RocksDBStore::~RocksDBStore()+0x9) [0x559749265599]
 17: (main()+0x307) [0x5597490b5fb7]
 18: (__libc_start_main()+0xe7) [0x7f723fb42b97]
 19: (_start()+0x2a) [0x55974918e03a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 0 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 rgw_sync
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   2/ 2 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
  max_recent       500
  max_new         1000
  log_file 
--- end dump of recent events ---
--- begin dump of recent events ---
   -23> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command assert hook 0x55974ac02130
   -22> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command abort hook 0x55974ac02130
   -21> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command perfcounters_dump hook 0x55974ac02130
   -20> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command 1 hook 0x55974ac02130
   -19> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command perf dump hook 0x55974ac02130
   -18> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command perfcounters_schema hook 0x55974ac02130
   -17> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command perf histogram dump hook 0x55974ac02130
   -16> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command 2 hook 0x55974ac02130
   -15> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command perf schema hook 0x55974ac02130
   -14> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command perf histogram schema hook 0x55974ac02130
   -13> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command perf reset hook 0x55974ac02130
   -12> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command config show hook 0x55974ac02130
   -11> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command config help hook 0x55974ac02130
   -10> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command config set hook 0x55974ac02130
    -9> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command config unset hook 0x55974ac02130
    -8> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command config get hook 0x55974ac02130
    -7> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command config diff hook 0x55974ac02130
    -6> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command config diff get hook 0x55974ac02130
    -5> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command log flush hook 0x55974ac02130
    -4> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command log dump hook 0x55974ac02130
    -3> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command log reopen hook 0x55974ac02130
    -2> 2019-06-12 23:40:43.531 7f724b27f0c0  5 asok(0x55974af78000) register_command dump_mempools hook 0x55974ba7c068
    -1> 2019-06-12 23:40:43.555 7f724b27f0c0  1 rocksdb: do_open column families: [default]
     0> 2019-06-12 23:40:51.363 7f724b27f0c0 -1 *** Caught signal (Aborted) **
 in thread 7f724b27f0c0 thread_name:ceph-kvstore-to

 ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)
 1: (()+0x12890) [0x7f7240c6f890]
 2: (gsignal()+0xc7) [0x7f723fb5fe97]
 3: (abort()+0x141) [0x7f723fb61801]
 4: (()+0x3039a) [0x7f723fb5139a]
 5: (()+0x30412) [0x7f723fb51412]
 6: (rocksdb::Version::~Version()+0x224) [0x559749529fe4]
 7: (rocksdb::Version::Unref()+0x35) [0x55974952a065]
 8: (rocksdb::SuperVersion::Cleanup()+0x68) [0x55974960f328]
 9: (rocksdb::ColumnFamilyData::~ColumnFamilyData()+0xf4) [0x5597496123d4]
 10: (rocksdb::ColumnFamilySet::~ColumnFamilySet()+0xb8) [0x559749612ba8]
 11: (rocksdb::VersionSet::~VersionSet()+0x4d) [0x55974951da5d]
 12: (rocksdb::DBImpl::CloseHelper()+0x6a8) [0x55974944a868]
 13: (rocksdb::DBImpl::~DBImpl()+0x65b) [0x559749455deb]
 14: (rocksdb::DBImpl::~DBImpl()+0x11) [0x559749455e21]
 15: (RocksDBStore::~RocksDBStore()+0xe9) [0x559749265349]
 16: (RocksDBStore::~RocksDBStore()+0x9) [0x559749265599]
 17: (main()+0x307) [0x5597490b5fb7]
 18: (__libc_start_main()+0xe7) [0x7f723fb42b97]
 19: (_start()+0x2a) [0x55974918e03a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 0 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 rgw_sync
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   2/ 2 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
  max_recent       500
  max_new         1000
  log_file /var/lib/ceph/crash/2019-06-12_21:40:51.369265Z_0eea9b49-ec97-4654-aee5-89d9207df79a/log
--- end dump of recent events ---
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux