Re: rocksdb corruption, stale pg, rebuild bucket index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 12 Jun 2019, Simon Leinen wrote:
> Sage Weil writes:
> > What happens if you do
> 
> >  ceph-kvstore-tool rocksdb /mnt/ceph/db stats
> 
> (I'm afraid that our ceph-kvstore-tool doesn't know about a "stats"
> command; but it still tries to open the database.)
> 
> That aborts after complaining about many missing files in /mnt/ceph/db.
> 
> When I ( cd /mnt/ceph/db && sudo ln -s ../db.slow/* . ) and re-run,
> it still aborts, just without complaining about missing files.

Ah, yes--I forgot that part :)

> I'm attaching the output (stdout+stderr combined), in case that helps.
> 
> > or, if htat works,
> 
> >  ceph-kvstore-tool rocksdb /mnt/ceph/db compact
> 
> > It looks like bluefs is happy (in that it can read the whole set 
> > of rocksdb files), so the questoin is if rocksdb can open them, or 
> > if there's some corruption or problem at the rocksdb level.
> 
> > The original crash is actually here:
> 
> >  ...
> >  9: (tc_new()+0x283) [0x7fbdbed8e943]
> >  10: (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_mutate(unsigned long, unsigned long, char const*, unsigned long)+0x69) [0x5600b1268109]
> >  11: (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)+0x63) [0x5600b12f5b43]
> >  12: (rocksdb::BlockBuilder::Add(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Slice const*)+0x10b) [0x5600b1eaca9b]
> >  ...
> 
> > where tc_new is (I think) tcmalloc.  Which looks to me like rocksdb 
> > is probably trying to allocate something very big.  The question is will 
> > that happen with the exported files or only on bluefs...
> 
> Yes, that's what I was thinking as well.  The server seems to have about
> 50GB of free RAM though, so maybe it was more like <UNDEFINED>ly big :-)
> 
> Also, your ceph-kvstore-tool command seems to have crashed somewhere
> else (the desctructor of a rocksdb::Version object?)
> 
>   2019-06-12 23:40:43.555 7f724b27f0c0  1 rocksdb: do_open column families: [default]
>   Unrecognized command: stats
>   ceph-kvstore-tool: /build/ceph-14.2.1/src/rocksdb/db/version_set.cc:356: rocksdb::Version::~Version(): Assertion `path_id < cfd_->ioptions()->cf_paths.size()' failed.
>   *** Caught signal (Aborted) **

Ah, this looks promising.. it looks like it got it open and has some 
problem with teh error/teardown path.

Try 'compact' instead of 'stats'?

sage


>    in thread 7f724b27f0c0 thread_name:ceph-kvstore-to
>    ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)
>    1: (()+0x12890) [0x7f7240c6f890]
>    2: (gsignal()+0xc7) [0x7f723fb5fe97]
>    3: (abort()+0x141) [0x7f723fb61801]
>    4: (()+0x3039a) [0x7f723fb5139a]
>    5: (()+0x30412) [0x7f723fb51412]
>    6: (rocksdb::Version::~Version()+0x224) [0x559749529fe4]
>    7: (rocksdb::Version::Unref()+0x35) [0x55974952a065]
>    8: (rocksdb::SuperVersion::Cleanup()+0x68) [0x55974960f328]
>    9: (rocksdb::ColumnFamilyData::~ColumnFamilyData()+0xf4) [0x5597496123d4]
>    10: (rocksdb::ColumnFamilySet::~ColumnFamilySet()+0xb8) [0x559749612ba8]
>    11: (rocksdb::VersionSet::~VersionSet()+0x4d) [0x55974951da5d]
>    12: (rocksdb::DBImpl::CloseHelper()+0x6a8) [0x55974944a868]
>    13: (rocksdb::DBImpl::~DBImpl()+0x65b) [0x559749455deb]
>    14: (rocksdb::DBImpl::~DBImpl()+0x11) [0x559749455e21]
>    15: (RocksDBStore::~RocksDBStore()+0xe9) [0x559749265349]
>    16: (RocksDBStore::~RocksDBStore()+0x9) [0x559749265599]
>    17: (main()+0x307) [0x5597490b5fb7]
>    18: (__libc_start_main()+0xe7) [0x7f723fb42b97]
>    19: (_start()+0x2a) [0x55974918e03a]
>   2019-06-12 23:40:51.363 7f724b27f0c0 -1 *** Caught signal (Aborted) **
>    in thread 7f724b27f0c0 thread_name:ceph-kvstore-to
> 
> > Thanks!
> 
> Thanks so much for looking into this!
> 
> We hope that we can get some access to S3 bucket indexes back, possibly
> by somehow dropping and re-creating those indexes.
> -- 
> Simon.
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux