Re: OSDs failing to boot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Paul,

could you please set both "debug bluestore" and "debug bluefs" to 20, run again and share the resulting log.

Thanks,

Igor

On 5/9/2019 2:34 AM, Rawson, Paul L. wrote:
Hi Folks,

I'm having trouble getting some of my OSDs to boot. At some point, these
disks got very full. I fixed the rule that was causing that, and they
are on average ~30% full now.

I'm getting the following in my logs:

      -1> 2019-05-08 16:05:18.956 7fdc7adbbf00 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rpm/el7/BUILD/ceph-14.2.1/src/include/interval_set.h:
In function 'void interval_set<T, Map>::insert(T, T, T*, T*) [with T =
long unsigned int; Map = std::map<long unsigned int, long unsigned int,
std::less<long unsigned int>, std::allocator<std::pair<const long
unsigned int, long unsigned int> > >]' thread 7fdc7adbbf00 time
2019-05-08 16:05:18.953372
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rpm/el7/BUILD/ceph-14.2.1/src/include/interval_set.h:
490: FAILED ceph_assert(p->first > start+len)

   ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972)
nautilus (stable)
   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x14a) [0x7fdc70daa676]
   2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char
const*, char const*, ...)+0) [0x7fdc70daa844]
   3: (interval_set<unsigned long, std::map<unsigned long, unsigned long,
std::less<unsigned long>, std::allocator<std::pair<unsigned long const,
unsigned long> > > >::insert(unsigned long, unsigned long, unsigned
long*, unsigned long*)+0x45f) [0x55b8960e03df]
   4: (BlueStore::allocate_bluefs_freespace(unsigned long, unsigned long,
std::vector<bluestore_pextent_t,
mempool::pool_allocator<(mempool::pool_index_t)4, bluestore_pextent_t>
  >*)+0x74e) [0x55b89611d13e]
   5: (BlueFS::_expand_slow_device(unsigned long,
std::vector<bluestore_pextent_t,
mempool::pool_allocator<(mempool::pool_index_t)4, bluestore_pextent_t>
  >&)+0x111) [0x55b8960c8211]
   6: (BlueFS::_allocate(unsigned char, unsigned long,
bluefs_fnode_t*)+0x68b) [0x55b8960c8f7b]
   7: (BlueFS::_allocate(unsigned char, unsigned long,
bluefs_fnode_t*)+0x362) [0x55b8960c8c52]
   8: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned
long)+0xe5) [0x55b8960c95d5]
   9: (BlueFS::_flush(BlueFS::FileWriter*, bool)+0x10b) [0x55b8960cb43b]
   10: (BlueRocksWritableFile::Flush()+0x3d) [0x55b8962bdfcd]
   11: (rocksdb::WritableFileWriter::Flush()+0x19e) [0x55b896531a4e]
   12: (rocksdb::WritableFileWriter::Sync(bool)+0x2e) [0x55b896531d2e]
   13: (rocksdb::BuildTable(std::string const&, rocksdb::Env*,
rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&,
rocksdb::EnvOptions const&, rocksdb::TableCache*,
rocksdb::InternalIteratorBase<rocksdb::Slice>*,
std::unique_ptr<rocksdb::InternalIteratorBase<rocksdb::Slice>,
std::default_delete<rocksdb::InternalIteratorBase<rocksdb::Slice> > >,
rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&,
std::vector<std::unique_ptr<rocksdb::IntTblPropCollectorFactory,
std::default_delete<rocksdb::IntTblPropCollectorFactory> >,
std::allocator<std::unique_ptr<rocksdb::IntTblPropCollectorFactory,
std::default_delete<rocksdb::IntTblPropCollectorFactory> > > > const*,
unsigned int, std::string const&, std::vector<unsigned long,
std::allocator<unsigned long> >, unsigned long,
rocksdb::SnapshotChecker*, rocksdb::CompressionType,
rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*,
rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int,
rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long,
unsigned long, rocksdb::Env::WriteLifeTimeHint)+0x2368) [0x55b89655fb68]
   14: (rocksdb::DBImpl::WriteLevel0TableForRecovery(int,
rocksdb::ColumnFamilyData*, rocksdb::MemTable*,
rocksdb::VersionEdit*)+0xc66) [0x55b8963d48c6]
   15: (rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long,
std::allocator<unsigned long> > const&, unsigned long*, bool)+0x1dce)
[0x55b8963d6f1e]
   16:
(rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor,
std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool,
bool)+0x809) [0x55b8963d7db9]
   17: (rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::string
const&, std::vector<rocksdb::ColumnFamilyDescriptor,
std::allocator<rocksdb::ColumnFamilyDescriptor> > const&,
std::vector<rocksdb::ColumnFamilyHandle*,
std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool,
bool)+0x658) [0x55b8963d8bc8]
   18: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::string const&,
std::vector<rocksdb::ColumnFamilyDescriptor,
std::allocator<rocksdb::ColumnFamilyDescriptor> > const&,
std::vector<rocksdb::ColumnFamilyHandle*,
std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0x24)
[0x55b8963da3a4]
   19: (RocksDBStore::do_open(std::ostream&, bool, bool,
std::vector<KeyValueDB::ColumnFamily,
std::allocator<KeyValueDB::ColumnFamily> > const*)+0x1660) [0x55b8961c2a80]
   20: (BlueStore::_open_db(bool, bool, bool)+0xf8e) [0x55b89611b37e]
   21: (BlueStore::_open_db_and_around(bool)+0x165) [0x55b8961388b5]
   22: (BlueStore::_fsck(bool, bool)+0xe5c) [0x55b8961692dc]
   23: (main()+0x107e) [0x55b895fc682e]
   24: (__libc_start_main()+0xf5) [0x7fdc6da4e3d5]
   25: (()+0x2718cf) [0x55b8960ac8cf]

       0> 2019-05-08 16:05:18.960 7fdc7adbbf00 -1 *** Caught signal
(Aborted) **
   in thread 7fdc7adbbf00 thread_name:ceph-bluestore-

   ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972)
nautilus (stable)
   1: (()+0xf5d0) [0x7fdc6f2905d0]
   2: (gsignal()+0x37) [0x7fdc6da62207]
   3: (abort()+0x148) [0x7fdc6da638f8]
   4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x199) [0x7fdc70daa6c5]
   5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char
const*, char const*, ...)+0) [0x7fdc70daa844]
   6: (interval_set<unsigned long, std::map<unsigned long, unsigned long,
std::less<unsigned long>, std::allocator<std::pair<unsigned long const,
unsigned long> > > >::insert(unsigned long, unsigned long, unsigned
long*, unsigned long*)+0x45f) [0x55b8960e03df]
   7: (BlueStore::allocate_bluefs_freespace(unsigned long, unsigned long,
std::vector<bluestore_pextent_t,
mempool::pool_allocator<(mempool::pool_index_t)4, bluestore_pextent_t>
  >*)+0x74e) [0x55b89611d13e]
   8: (BlueFS::_expand_slow_device(unsigned long,
std::vector<bluestore_pextent_t,
mempool::pool_allocator<(mempool::pool_index_t)4, bluestore_pextent_t>
  >&)+0x111) [0x55b8960c8211]
   9: (BlueFS::_allocate(unsigned char, unsigned long,
bluefs_fnode_t*)+0x68b) [0x55b8960c8f7b]
   10: (BlueFS::_allocate(unsigned char, unsigned long,
bluefs_fnode_t*)+0x362) [0x55b8960c8c52]
   11: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned
long)+0xe5) [0x55b8960c95d5]
   12: (BlueFS::_flush(BlueFS::FileWriter*, bool)+0x10b) [0x55b8960cb43b]
   13: (BlueRocksWritableFile::Flush()+0x3d) [0x55b8962bdfcd]
   14: (rocksdb::WritableFileWriter::Flush()+0x19e) [0x55b896531a4e]
   15: (rocksdb::WritableFileWriter::Sync(bool)+0x2e) [0x55b896531d2e]
   16: (rocksdb::BuildTable(std::string const&, rocksdb::Env*,
rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&,
rocksdb::EnvOptions const&, rocksdb::TableCache*,
rocksdb::InternalIteratorBase<rocksdb::Slice>*,
std::unique_ptr<rocksdb::InternalIteratorBase<rocksdb::Slice>,
std::default_delete<rocksdb::InternalIteratorBase<rocksdb::Slice> > >,
rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&,
std::vector<std::unique_ptr<rocksdb::IntTblPropCollectorFactory,
std::default_delete<rocksdb::IntTblPropCollectorFactory> >,
std::allocator<std::unique_ptr<rocksdb::IntTblPropCollectorFactory,
std::default_delete<rocksdb::IntTblPropCollectorFactory> > > > const*,
unsigned int, std::string const&, std::vector<unsigned long,
std::allocator<unsigned long> >, unsigned long,
rocksdb::SnapshotChecker*, rocksdb::CompressionType,
rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*,
rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int,
rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long,
unsigned long, rocksdb::Env::WriteLifeTimeHint)+0x2368) [0x55b89655fb68]
   17: (rocksdb::DBImpl::WriteLevel0TableForRecovery(int,
rocksdb::ColumnFamilyData*, rocksdb::MemTable*,
rocksdb::VersionEdit*)+0xc66) [0x55b8963d48c6]
   18: (rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long,
std::allocator<unsigned long> > const&, unsigned long*, bool)+0x1dce)
[0x55b8963d6f1e]
   19:
(rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor,
std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool,
bool)+0x809) [0x55b8963d7db9]
   20: (rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::string
const&, std::vector<rocksdb::ColumnFamilyDescriptor,
std::allocator<rocksdb::ColumnFamilyDescriptor> > const&,
std::vector<rocksdb::ColumnFamilyHandle*,
std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool,
bool)+0x658) [0x55b8963d8bc8]
   21: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::string const&,
std::vector<rocksdb::ColumnFamilyDescriptor,
std::allocator<rocksdb::ColumnFamilyDescriptor> > const&,
std::vector<rocksdb::ColumnFamilyHandle*,
std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0x24)
[0x55b8963da3a4]
   22: (RocksDBStore::do_open(std::ostream&, bool, bool,
std::vector<KeyValueDB::ColumnFamily,
std::allocator<KeyValueDB::ColumnFamily> > const*)+0x1660) [0x55b8961c2a80]
   23: (BlueStore::_open_db(bool, bool, bool)+0xf8e) [0x55b89611b37e]
   24: (BlueStore::_open_db_and_around(bool)+0x165) [0x55b8961388b5]
   25: (BlueStore::_fsck(bool, bool)+0xe5c) [0x55b8961692dc]
   26: (main()+0x107e) [0x55b895fc682e]
   27: (__libc_start_main()+0xf5) [0x7fdc6da4e3d5]
   28: (()+0x2718cf) [0x55b8960ac8cf]
   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

The osds were provisioned using

ceph-volume prepare --dmcrypt --data /dev/sdx

It looks to me like one of the db/wal/main claimed all space while the
osd was full and hasn't released it -- leaving none left for the others.

I've tried extracting pgs with ceph-objectstore tool, but get the same
stack trace.

Any help would be greatly appreciated!

-Paul

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux