Hi Igor, thank you for your ansere! >first of all Quincy does have a fix for the issue, see >https://tracker.ceph.com/issues/53466 (and its Quincy counterpart >https://tracker.ceph.com/issues/58588) Thank you I somehow missed that release, good to know! >SSD or HDD? Standalone or shared DB volume? I presume the latter... What >is disk size and current utilization? > >Please share ceph-bluestore-tool's bluefs-bdev-sizes command output if >possible We use 4 TB NVMe SSDs, shared db yes and mainly Micron with some Dell and Samsung in this cluster: Micron_7400_MTFDKCB3T8TDZ_214733D291B1 cloud5-1561:nvme5n1 osd.5 All Disks are at ~ 88% utilization. I noticed that around 92% our disks tend to run into this bug. Here are some bluefs-bdev-sizes from different OSDs on different hosts in this cluster: ceph-bluestore-tool bluefs-bdev-sizes --path /var/lib/ceph/osd/ceph-36/ inferring bluefs devices from bluestore path 1 : device size 0x37e3ec00000 : using 0x2e1b3900000(2.9 TiB) ceph-bluestore-tool bluefs-bdev-sizes --path /var/lib/ceph/osd/ceph-24/ inferring bluefs devices from bluestore path 1 : device size 0x37e3ec00000 : using 0x2d4e318d000(2.8 TiB) ceph-bluestore-tool bluefs-bdev-sizes --path /var/lib/ceph/osd/ceph-5/ inferring bluefs devices from bluestore path 1 : device size 0x37e3ec00000 : using 0x2f2da93d000(2.9 TiB) >Generally, given my assumption that DB volume is currently collocated >and you still want to stay on Pacific, you might want to consider >redeploying OSDs with a standalone DB volume setup. > >Just create large enough (2x of the current DB size seems to be pretty >conservative estimation for that volume's size) additional LV on top of >the same physical disk. And put DB there... > >Separating DB from main disk would result in much less fragmentation at >DB volume and hence work around the problem. The cost would be having >some extra spare space at DB volume unavailable for user data . I guess that makes, so the suggestion would be to deploy the osd and db on the same NVMe but with different logical volumes or updating to quincy. Thank you! Carsten Von: Igor Fedotov <igor.fedotov@xxxxxxxx> Datum: Dienstag, 20. Juni 2023 um 12:48 An: Carsten Grommel <c.grommel@xxxxxxxxxxxx>, ceph-users@xxxxxxx <ceph-users@xxxxxxx> Betreff: Re: Ceph Pacific bluefs enospc bug with newly created OSDs Hi Carsten, first of all Quincy does have a fix for the issue, see https://tracker.ceph.com/issues/53466 (and its Quincy counterpart https://tracker.ceph.com/issues/58588) Could you please share a bit more info on OSD disk layout? SSD or HDD? Standalone or shared DB volume? I presume the latter... What is disk size and current utilization? Please share ceph-bluestore-tool's bluefs-bdev-sizes command output if possible Generally, given my assumption that DB volume is currently collocated and you still want to stay on Pacific, you might want to consider redeploying OSDs with a standalone DB volume setup. Just create large enough (2x of the current DB size seems to be pretty conservative estimation for that volume's size) additional LV on top of the same physical disk. And put DB there... Separating DB from main disk would result in much less fragmentation at DB volume and hence work around the problem. The cost would be having some extra spare space at DB volume unavailable for user data . Hope this helps, Igor On 20/06/2023 10:29, Carsten Grommel wrote: > Hi all, > > we are experiencing the “bluefs enospc bug” again after redeploying all OSDs of our Pacific Cluster. > I know that our cluster is a bit too utilized at the moment with 87.26 % raw usage but still this should not happen afaik. > We never hat this problem with previous ceph versions and right now I am kind of out of ideas at how to tackle these crashes. > Compacting the database did not help in the past either. > Redeploy seems to no help in the long run as well. For documentation I used these commands to redeploy the osds: > > systemctl stop ceph-osd@${OSDNUM} > ceph osd destroy --yes-i-really-mean-it ${OSDNUM} > blkdiscard ${DEVICE} > sgdisk -Z ${DEVICE} > dmsetup remove ${DMDEVICE} > ceph-volume lvm create --osd-id ${OSDNUM} --data ${DEVICE} > > Any ideas or possible solutions on this? I am not yet ready to upgrade our clusters to quincy, also I do presume that this bug is still present in quincy as well? > > Follow our cluster information: > > Crash Info: > ceph crash info 2023-06-19T21:23:51.285180Z_ac4105d7-cb09-45c8-a6e3-8a6bb6727b25 > { > "assert_condition": "abort", > "assert_file": "/build/ceph/src/os/bluestore/BlueFS.cc", > "assert_func": "int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)", > "assert_line": 2810, > "assert_msg": "/build/ceph/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread 7fd561810100 time 2023-06-19T23:23:51.261617+0200\n/build/ceph/src/os/bluestore/BlueFS.cc: 2810: ceph_abort_msg(\"bluefs enospc\")\n", > "assert_thread_name": "ceph-osd", > "backtrace": [ > "/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7fd56225f730]", > "gsignal()", > "abort()", > "(ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1a7) [0x557bb3c65762]", > "(BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned long)+0x1175) [0x557bb42e7945]", > "(BlueFS::_flush(BlueFS::FileWriter*, bool, bool*)+0xa1) [0x557bb42e7ad1]", > "(BlueFS::_flush(BlueFS::FileWriter*, bool, std::unique_lock<std::mutex>&)+0x2e) [0x557bb42f803e]", > "(BlueRocksWritableFile::Append(rocksdb::Slice const&)+0x11b) [0x557bb431134b]", > "(rocksdb::LegacyWritableFileWrapper::Append(rocksdb::Slice const&, rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x44) [0x557bb478e602]", > "(rocksdb::WritableFileWriter::WriteBuffered(char const*, unsigned long)+0x333) [0x557bb4956feb]", > "(rocksdb::WritableFileWriter::Append(rocksdb::Slice const&)+0x5d1) [0x557bb4955569]", > "(rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice const&, rocksdb::CompressionType, rocksdb::BlockHandle*, bool)+0x11d) [0x557bb4b142e1]", > "(rocksdb::BlockBasedTableBuilder::WriteBlock(rocksdb::Slice const&, rocksdb::BlockHandle*, bool)+0x7d6) [0x557bb4b140ca]", > "(rocksdb::BlockBasedTableBuilder::WriteBlock(rocksdb::BlockBuilder*, rocksdb::BlockHandle*, bool)+0x48) [0x557bb4b138e0]", > "(rocksdb::BlockBasedTableBuilder::Flush()+0x9a) [0x557bb4b13890]", > "(rocksdb::BlockBasedTableBuilder::Add(rocksdb::Slice const&, rocksdb::Slice const&)+0x192) [0x557bb4b133c8]", > "(rocksdb::BuildTable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::Env*, rocksdb::FileSystem*, rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&, rocksdb::FileOptions const&, rocksdb::TableCache*, rocksdb::InternalIteratorBase<rocksdb::Slice>*, std::vector<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> >, std::allocator<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> > > >, rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, std::vector<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> >, std::allocator<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> > > > const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<unsigned long, std::allocator<unsigned long> >, unsigned long, rocksdb::SnapshotChecker*, rocksdb::CompressionType, unsigned long, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long, unsigned long, rocksdb::Env::WriteLifeTimeHint, unsigned long)+0x773) [0x557bb4a9aa7d]", > "(rocksdb::DBImpl::WriteLevel0TableForRecovery(int, rocksdb::ColumnFamilyData*, rocksdb::MemTable*, rocksdb::VersionEdit*)+0x5de) [0x557bb4824676]", > "(rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long*, bool, bool*)+0x1aa0) [0x557bb48232d0]", > "(rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool, unsigned long*)+0x158a) [0x557bb4820846]", > "(rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool, bool)+0x679) [0x557bb4825b25]", > "(rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0x52) [0x557bb4824efa]", > "(RocksDBStore::do_open(std::ostream&, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xdaf) [0x557bb473b85f]", > "(BlueStore::_open_db(bool, bool, bool)+0x44b) [0x557bb41ec20b]", > "(BlueStore::_open_db_and_around(bool, bool)+0x2ef) [0x557bb425288f]", > "(BlueStore::_mount()+0x9c) [0x557bb42551ec]", > "(OSD::init()+0x38a) [0x557bb3d568da]", > "main()", > "__libc_start_main()", > "_start()" > ], > "ceph_version": "16.2.11", > "crash_id": "2023-06-19T21:23:51.285180Z_ac4105d7-cb09-45c8-a6e3-8a6bb6727b25", > "entity_name": "osd.39", > "os_id": "10", > "os_name": "Debian GNU/Linux 10 (buster)", > "os_version": "10 (buster)", > "os_version_id": "10", > "process_name": "ceph-osd", > "stack_sig": "23f90145bebe39074210d4a79260e8977aec6b1c4d963740d1a04c3ddd4756a4", > "timestamp": "2023-06-19T21:23:51.285180Z", > "utsname_hostname": "cloud5-1567", > "utsname_machine": "x86_64", > "utsname_release": "5.10.144+1-ph", > "utsname_sysname": "Linux", > "utsname_version": "#1 SMP Mon Sep 26 07:02:56 UTC 2022" > } > > Utilization: > ceph df > --- RAW STORAGE --- > CLASS SIZE AVAIL USED RAW USED %RAW USED > ssd 168 TiB 21 TiB 146 TiB 146 TiB 87.26 > TOTAL 168 TiB 21 TiB 146 TiB 146 TiB 87.26 > > --- POOLS --- > POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL > device_health_metrics 1 1 4.7 MiB 48 14 MiB 0 2.1 TiB > cephstor5 2 2048 52 TiB 14.27M 146 TiB 95.89 2.1 TiB > cephfs_cephstor5_data 3 32 95 MiB 118.52k 1.4 GiB 0.02 2.1 TiB > cephfs_cephstor5_metadata 4 16 352 MiB 166 1.0 GiB 0.02 2.1 TiB > > Versions: > ceph versions > { > "mon": { > "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 3 > }, > "mgr": { > "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 3 > }, > "osd": { > "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 48 > }, > "mds": { > "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 3 > }, > "overall": { > "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 57 > } > } > > > > Kind regards > Carsten Grommel > > ------------------------------- > Profihost GmbH > Expo Plaza 1 > 30539 Hannover > Deutschland > > Tel.: +49 (511) 5151 8181 | Fax.: +49 (511) 5151 8282 > URL: http://www.profihost.com | E-Mail: info@xxxxxxxxxxxxx<mailto:info@xxxxxxxxxxxxx> > > Sitz der Gesellschaft: Hannover, USt-IdNr. DE249338561 > Registergericht: Amtsgericht Hannover, Register-Nr.: HRB 222926 > Geschäftsführer: Marc Zocher, Dr. Claus Boyens, Daniel Hagemeier > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx