Re: Is it a bug that OSD crashed when it's full?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



You could copy the whole block device to another larger disk using dd and then you'd use parted to grow the partition, then lvm to resize pv (pvextend) then lvm volume containing the OSD to make it larger (lvextend).  Once the LVM volume is larger you should be able to resize the osd using the bluestore util:

https://serverfault.com/questions/1066144/how-can-i-extend-or-resize-lvm-from-disk-space

I'd recommend doing the dd copy the smaller device to a larger device using a separate server if you can.  Easier to isolate the work from the production box.  

You should have your various backfill/nearfull settings adjusted to prevent your osds from hitting 100% full.   This gives you a chance to fix things before getting things wedged up..

On Mon, Oct 31, 2022, 8:22 PM Tony Liu <tonyliu0592@xxxxxxxxxxx> wrote:
Hi Zizon,

I know I ran out of space. I thought that full ratio would prevent me from being here.
I tried a few ceph-*-tool, they crash the same way. I guess they need rockdb to start?
Any recommendations how I can restore it or copy data out, or copy the volume to
another bigger disk?


Thanks!
Tony
________________________________________
From: Zizon Qiu <zzdtsv@xxxxxxxxx>
Sent: October 31, 2022 08:13 PM
To: Tony Liu
Cc: ceph-users@xxxxxxx; dev@xxxxxxx
Subject: Re: Is it a bug that OSD crashed when it's full?


 15: (rocksdb::DBImpl::WriteLevel0TableForRecovery(int, rocksdb::ColumnFamilyData*, rocksdb::MemTable*, rocksdb::VersionEdit*)+0xcf5) [0x55858e3f0ea5]
 16: (rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long*, bool, bool*)+0x1c2e) [0x55858e3f35de]
 17: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool, unsigned long*)+0xae8) [0x55858e3f4938]
 18: (rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool, bool)+0x59d) [0x55858e3ee65d]

I think you should manage to make room for at least this to finish.
The rocksdb can not even startup with enough disk space.

On Tue, Nov 1, 2022 at 8:49 AM Tony Liu <tonyliu0592@xxxxxxxxxxx<mailto:tonyliu0592@xxxxxxxxxxx>> wrote:
Hi,

Based on doc, Ceph prevents you from writing to a full OSD so that you don’t lose data.
In my case, with v16.2.10, OSD crashed when it's full. Is this expected or some bug?
I'd expect write failure instead of OSD crash. It keeps crashing when tried to bring it up.
Is there any way to bring it back?

    -7> 2022-10-31T22:52:57.426+0000 7fe37fd94200  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1667256777427646, "job": 1, "event": "recovery_started", "log_files": [23300]}
    -6> 2022-10-31T22:52:57.426+0000 7fe37fd94200  4 rocksdb: [db_impl/db_impl_open.cc:760] Recovering log #23300 mode 2
    -5> 2022-10-31T22:52:57.529+0000 7fe37fd94200  3 rocksdb: [le/block_based/filter_policy.cc:584] Using legacy Bloom filter with high (20) bits/key. Dramatic filter space and/or accuracy improvement is available with format_version>=5.
    -4> 2022-10-31T22:52:57.592+0000 7fe37fd94200  1 bluefs _allocate unable to allocate 0x90000 on bdev 1, allocator name block, allocator type hybrid, capacity 0x6fc8400000, block size 0x1000, free 0x57acbc000, fragmentation 0.359784, allocated 0x0
    -3> 2022-10-31T22:52:57.592+0000 7fe37fd94200 -1 bluefs _allocate allocation failed, needed 0x8064a
    -2> 2022-10-31T22:52:57.592+0000 7fe37fd94200 -1 bluefs _flush_range allocated: 0x0 offset: 0x0 length: 0x8064a
    -1> 2022-10-31T22:52:57.604+0000 7fe37fd94200 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread 7fe37fd94200 time 2022-10-31T22:52:57.593873+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/os/bluestore/BlueFS.cc: 2768: ceph_abort_msg("bluefs enospc")

 ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe5) [0x55858d7e2e7c]
 2: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned long)+0x1131) [0x55858dee8cc1]
 3: (BlueFS::_flush(BlueFS::FileWriter*, bool, bool*)+0x90) [0x55858dee8fa0]
 4: (BlueFS::_flush(BlueFS::FileWriter*, bool, std::unique_lock<std::mutex>&)+0x32) [0x55858defa0b2]
 5: (BlueRocksWritableFile::Append(rocksdb::Slice const&)+0x11b) [0x55858df129eb]
 6: (rocksdb::LegacyWritableFileWrapper::Append(rocksdb::Slice const&, rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x1f) [0x55858e3ae55f]
 7: (rocksdb::WritableFileWriter::WriteBuffered(char const*, unsigned long)+0x58a) [0x55858e4c02aa]
 8: (rocksdb::WritableFileWriter::Append(rocksdb::Slice const&)+0x2d0) [0x55858e4c1700]
 9: (rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice const&, rocksdb::CompressionType, rocksdb::BlockHandle*, bool)+0xb6) [0x55858e5dce86]
 10: (rocksdb::BlockBasedTableBuilder::WriteBlock(rocksdb::Slice const&, rocksdb::BlockHandle*, bool)+0x26c) [0x55858e5dd7cc]
 11: (rocksdb::BlockBasedTableBuilder::WriteBlock(rocksdb::BlockBuilder*, rocksdb::BlockHandle*, bool)+0x3c) [0x55858e5ddecc]
 12: (rocksdb::BlockBasedTableBuilder::Flush()+0x6d) [0x55858e5ddf5d]
 13: (rocksdb::BlockBasedTableBuilder::Add(rocksdb::Slice const&, rocksdb::Slice const&)+0x2b8) [0x55858e5e13c8]
 14: (rocksdb::BuildTable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::Env*, rocksdb::FileSystem*, rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&, rocksdb::FileOptions const&, rocksdb::TableCache*, rocksdb::InternalIteratorBase<rocksdb::Slice>*, std::vector<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> >, std::allocator<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> > > >, rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, std::vector<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> >, std::allocator<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> > > > const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<unsigned long, std::allocator<unsigned long> >, unsigned long, rocksdb::SnapshotChecker*, rocksdb::CompressionType, unsigned long, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long, unsigned long, rocksdb::Env::WriteLifeTimeHint, unsigned long)+0xa45) [0x55858e58be45]




 19: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0x15) [0x55858e3ef9f5]
 20: (RocksDBStore::do_open(std::ostream&, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x10c1) [0x55858e367601]
 21: (BlueStore::_open_db(bool, bool, bool)+0x8c7) [0x55858ddde857]
 22: (BlueStore::_open_db_and_around(bool, bool)+0x2f7) [0x55858de4c8f7]
 23: (BlueStore::_mount()+0x204) [0x55858de4f7b4]
 24: (OSD::init()+0x380) [0x55858d91d1d0]
 25: main()
 26: __libc_start_main()
 27: _start()

     0> 2022-10-31T22:52:57.617+0000 7fe37fd94200 -1 *** Caught signal (Aborted) **
 in thread 7fe37fd94200 thread_name:ceph-osd

 ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12cf0) [0x7fe37dd33cf0]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1b6) [0x55858d7e2f4d]
 5: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned long)+0x1131) [0x55858dee8cc1]
 6: (BlueFS::_flush(BlueFS::FileWriter*, bool, bool*)+0x90) [0x55858dee8fa0]
 7: (BlueFS::_flush(BlueFS::FileWriter*, bool, std::unique_lock<std::mutex>&)+0x32) [0x55858defa0b2]
 8: (BlueRocksWritableFile::Append(rocksdb::Slice const&)+0x11b) [0x55858df129eb]
 9: (rocksdb::LegacyWritableFileWrapper::Append(rocksdb::Slice const&, rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x1f) [0x55858e3ae55f]
 10: (rocksdb::WritableFileWriter::WriteBuffered(char const*, unsigned long)+0x58a) [0x55858e4c02aa]
 11: (rocksdb::WritableFileWriter::Append(rocksdb::Slice const&)+0x2d0) [0x55858e4c1700]
 12: (rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice const&, rocksdb::CompressionType, rocksdb::BlockHandle*, bool)+0xb6) [0x55858e5dce86]
 13: (rocksdb::BlockBasedTableBuilder::WriteBlock(rocksdb::Slice const&, rocksdb::BlockHandle*, bool)+0x26c) [0x55858e5dd7cc]
 14: (rocksdb::BlockBasedTableBuilder::WriteBlock(rocksdb::BlockBuilder*, rocksdb::BlockHandle*, bool)+0x3c) [0x55858e5ddecc]
 15: (rocksdb::BlockBasedTableBuilder::Flush()+0x6d) [0x55858e5ddf5d]
 16: (rocksdb::BlockBasedTableBuilder::Add(rocksdb::Slice const&, rocksdb::Slice const&)+0x2b8) [0x55858e5e13c8]
 17: (rocksdb::BuildTable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::Env*, rocksdb::FileSystem*, rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&, rocksdb::FileOptions const&, rocksdb::TableCache*, rocksdb::InternalIteratorBase<rocksdb::Slice>*, std::vector<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> >, std::allocator<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> > > >, rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, std::vector<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> >, std::allocator<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> > > > const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<unsigned long, std::allocator<unsigned long> >, unsigned long, rocksdb::SnapshotChecker*, rocksdb::CompressionType, unsigned long, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long, unsigned long, rocksdb::Env::WriteLifeTimeHint, unsigned long)+0xa45) [0x55858e58be45]
 18: (rocksdb::DBImpl::WriteLevel0TableForRecovery(int, rocksdb::ColumnFamilyData*, rocksdb::MemTable*, rocksdb::VersionEdit*)+0xcf5) [0x55858e3f0ea5]
 19: (rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long*, bool, bool*)+0x1c2e) [0x55858e3f35de]
 20: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool, unsigned long*)+0xae8) [0x55858e3f4938]
 21: (rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool, bool)+0x59d) [0x55858e3ee65d]
 22: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0x15) [0x55858e3ef9f5]
 23: (RocksDBStore::do_open(std::ostream&, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x10c1) [0x55858e367601]
 24: (BlueStore::_open_db(bool, bool, bool)+0x8c7) [0x55858ddde857]
 25: (BlueStore::_open_db_and_around(bool, bool)+0x2f7) [0x55858de4c8f7]
 26: (BlueStore::_mount()+0x204) [0x55858de4f7b4]
 27: (OSD::init()+0x380) [0x55858d91d1d0]
 28: main()
 29: __libc_start_main()
 30: _start()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Thanks!
Tony
_______________________________________________
Dev mailing list -- dev@xxxxxxx<mailto:dev@xxxxxxx>
To unsubscribe send an email to dev-leave@xxxxxxx<mailto:dev-leave@xxxxxxx>
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux