Ceph Pacific bluefs enospc bug with newly created OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

we are experiencing the “bluefs enospc bug” again after redeploying all OSDs of our Pacific Cluster.
I know that our cluster is a bit too utilized at the moment with 87.26 % raw usage but still this should not happen afaik.
We never hat this problem with previous ceph versions and right now I am kind of out of ideas at how to tackle these crashes.
Compacting the database did not help in the past either.
Redeploy seems to no help in the long run as well. For documentation I used these commands to redeploy the osds:

systemctl stop ceph-osd@${OSDNUM}
ceph osd destroy --yes-i-really-mean-it ${OSDNUM}
blkdiscard ${DEVICE}
sgdisk -Z ${DEVICE}
dmsetup remove ${DMDEVICE}
ceph-volume lvm create --osd-id ${OSDNUM} --data ${DEVICE}

Any ideas or possible solutions on this?  I am not yet ready to upgrade our clusters to quincy, also I do presume that this bug is still present in quincy as well?

Follow our cluster information:

Crash Info:
ceph crash info 2023-06-19T21:23:51.285180Z_ac4105d7-cb09-45c8-a6e3-8a6bb6727b25
{
    "assert_condition": "abort",
    "assert_file": "/build/ceph/src/os/bluestore/BlueFS.cc",
    "assert_func": "int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)",
    "assert_line": 2810,
    "assert_msg": "/build/ceph/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread 7fd561810100 time 2023-06-19T23:23:51.261617+0200\n/build/ceph/src/os/bluestore/BlueFS.cc: 2810: ceph_abort_msg(\"bluefs enospc\")\n",
    "assert_thread_name": "ceph-osd",
    "backtrace": [
        "/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7fd56225f730]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1a7) [0x557bb3c65762]",
        "(BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned long)+0x1175) [0x557bb42e7945]",
        "(BlueFS::_flush(BlueFS::FileWriter*, bool, bool*)+0xa1) [0x557bb42e7ad1]",
        "(BlueFS::_flush(BlueFS::FileWriter*, bool, std::unique_lock<std::mutex>&)+0x2e) [0x557bb42f803e]",
        "(BlueRocksWritableFile::Append(rocksdb::Slice const&)+0x11b) [0x557bb431134b]",
        "(rocksdb::LegacyWritableFileWrapper::Append(rocksdb::Slice const&, rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x44) [0x557bb478e602]",
        "(rocksdb::WritableFileWriter::WriteBuffered(char const*, unsigned long)+0x333) [0x557bb4956feb]",
        "(rocksdb::WritableFileWriter::Append(rocksdb::Slice const&)+0x5d1) [0x557bb4955569]",
        "(rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice const&, rocksdb::CompressionType, rocksdb::BlockHandle*, bool)+0x11d) [0x557bb4b142e1]",
        "(rocksdb::BlockBasedTableBuilder::WriteBlock(rocksdb::Slice const&, rocksdb::BlockHandle*, bool)+0x7d6) [0x557bb4b140ca]",
        "(rocksdb::BlockBasedTableBuilder::WriteBlock(rocksdb::BlockBuilder*, rocksdb::BlockHandle*, bool)+0x48) [0x557bb4b138e0]",
        "(rocksdb::BlockBasedTableBuilder::Flush()+0x9a) [0x557bb4b13890]",
        "(rocksdb::BlockBasedTableBuilder::Add(rocksdb::Slice const&, rocksdb::Slice const&)+0x192) [0x557bb4b133c8]",
        "(rocksdb::BuildTable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::Env*, rocksdb::FileSystem*, rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&, rocksdb::FileOptions const&, rocksdb::TableCache*, rocksdb::InternalIteratorBase<rocksdb::Slice>*, std::vector<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> >, std::allocator<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> > > >, rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, std::vector<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> >, std::allocator<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> > > > const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<unsigned long, std::allocator<unsigned long> >, unsigned long, rocksdb::SnapshotChecker*, rocksdb::CompressionType, unsigned long, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long, unsigned long, rocksdb::Env::WriteLifeTimeHint, unsigned long)+0x773) [0x557bb4a9aa7d]",
       "(rocksdb::DBImpl::WriteLevel0TableForRecovery(int, rocksdb::ColumnFamilyData*, rocksdb::MemTable*, rocksdb::VersionEdit*)+0x5de) [0x557bb4824676]",
        "(rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long*, bool, bool*)+0x1aa0) [0x557bb48232d0]",
       "(rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool, unsigned long*)+0x158a) [0x557bb4820846]",
        "(rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool, bool)+0x679) [0x557bb4825b25]",
        "(rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0x52) [0x557bb4824efa]",
        "(RocksDBStore::do_open(std::ostream&, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xdaf) [0x557bb473b85f]",
        "(BlueStore::_open_db(bool, bool, bool)+0x44b) [0x557bb41ec20b]",
        "(BlueStore::_open_db_and_around(bool, bool)+0x2ef) [0x557bb425288f]",
        "(BlueStore::_mount()+0x9c) [0x557bb42551ec]",
        "(OSD::init()+0x38a) [0x557bb3d568da]",
        "main()",
        "__libc_start_main()",
        "_start()"
    ],
    "ceph_version": "16.2.11",
    "crash_id": "2023-06-19T21:23:51.285180Z_ac4105d7-cb09-45c8-a6e3-8a6bb6727b25",
    "entity_name": "osd.39",
    "os_id": "10",
    "os_name": "Debian GNU/Linux 10 (buster)",
    "os_version": "10 (buster)",
    "os_version_id": "10",
    "process_name": "ceph-osd",
    "stack_sig": "23f90145bebe39074210d4a79260e8977aec6b1c4d963740d1a04c3ddd4756a4",
    "timestamp": "2023-06-19T21:23:51.285180Z",
    "utsname_hostname": "cloud5-1567",
    "utsname_machine": "x86_64",
    "utsname_release": "5.10.144+1-ph",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Mon Sep 26 07:02:56 UTC 2022"
}

Utilization:
ceph df
--- RAW STORAGE ---
CLASS     SIZE   AVAIL     USED  RAW USED  %RAW USED
ssd    168 TiB  21 TiB  146 TiB   146 TiB      87.26
TOTAL  168 TiB  21 TiB  146 TiB   146 TiB      87.26

--- POOLS ---
POOL                       ID   PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
device_health_metrics       1     1  4.7 MiB       48   14 MiB      0    2.1 TiB
cephstor5                   2  2048   52 TiB   14.27M  146 TiB  95.89    2.1 TiB
cephfs_cephstor5_data       3    32   95 MiB  118.52k  1.4 GiB   0.02    2.1 TiB
cephfs_cephstor5_metadata   4    16  352 MiB      166  1.0 GiB   0.02    2.1 TiB

Versions:
ceph versions
{
    "mon": {
        "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 3
    },
    "mgr": {
        "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 3
    },
    "osd": {
        "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 48
    },
    "mds": {
        "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 3
    },
    "overall": {
        "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 57
    }
}



Kind regards
Carsten Grommel

-------------------------------
Profihost GmbH
Expo Plaza 1
30539 Hannover
Deutschland

Tel.: +49 (511) 5151 8181 | Fax.: +49 (511) 5151 8282
URL: http://www.profihost.com | E-Mail: info@xxxxxxxxxxxxx<mailto:info@xxxxxxxxxxxxx>

Sitz der Gesellschaft: Hannover, USt-IdNr. DE249338561
Registergericht: Amtsgericht Hannover, Register-Nr.: HRB 222926
Geschäftsführer: Marc Zocher, Dr. Claus Boyens, Daniel Hagemeier
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux