I have a self compiled ceph cluster that base on v14.2.9, and test writing to a pool until it`s full, after that osds start to panic, and no longer can be restarted. osd config: "mon_osd_nearfull_ratio": "0.850000", "mon_osd_full_ratio": "0.950000" "osd_failsafe_full_ratio": "0.970000", Although osd_failsafe_full_ratio is set to 0.97 but indeed %USE of osd can reach up to 99.7% I have tried to enlarge the LVM volume of osd and then use tools "ceph-bluestore-tool bluefs-bdev-expand" to notify bluestore. some osds were success started, but some failed with logs below log output: [root@oss-smaug-node4 ~]# ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-28/ inferring bluefs devices from bluestore path 2020-09-02 15:47:39.594 7fb33d836c00 -1 bluestore(/var/lib/ceph/osd/ceph-28) allocate_bluefs_freespace failed to allocate on 0x2ad0000 min_size 0x2ad0000 > allocated total 0x720000 bluefs_shared_alloc_size 0x10000 allocated 0x720000 available 0x b000 2020-09-02 15:47:39.594 7fb33d836c00 -1 bluefs _allocate failed to expand slow device to fit +0x2ac85c3 2020-09-02 15:47:39.594 7fb33d836c00 -1 bluefs _flush_range allocated: 0x0 offset: 0x0 length: 0x2ac85c3 /root/rpmbuild/BUILD/ceph-1.0.0-5-g7a08a9e/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread 7fb33d836c00 time 2020-09-02 15:47:39.596122 /root/rpmbuild/BUILD/ceph-1.0.0-5-g7a08a9e/src/os/bluestore/BlueFS.cc: 2269: ceph_abort_msg("bluefs enospc") ceph version 1.0.0-5-g7a08a9e (7a08a9eaefdc4e8786cfbfcf4eb387a6e603c13c) nautilus (stable) 1: (ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0xdd) [0x7fb3339f3ef4] 2: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned long)+0x1cd7) [0x564bab3dcaf7] 3: (BlueFS::_flush(BlueFS::FileWriter*, bool)+0x10b) [0x564bab3dcceb] 4: (BlueRocksWritableFile::Flush()+0x3d) [0x564bab599afd] 5: (rocksdb::WritableFileWriter::Flush()+0x22c) [0x564bab7c0fdc] 6: (rocksdb::WritableFileWriter::Sync(bool)+0x2e) [0x564bab7c119e] 7: (rocksdb::BuildTable(std::string const&, rocksdb::Env*, rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&, rocksdb::EnvOptions const&, rocksdb::TableCache*, rocksdb::InternalIteratorBase<rocksdb::Slice>*, std::vector<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> >, std::allocator<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> > > >, rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, std::vector<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> >, std::allocator<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> > > > const*, unsigned int, std::string const&, std::vector<unsigned long, std::allocator<unsigned long> >, unsigned long, rocksdb::SnapshotChecker*, rocksdb::CompressionType, unsigned long, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long, unsigned long, rocksdb::Env::WriteLifeTimeHint)+0x2392) [0x564bab7ecb82] 8: (rocksdb::DBImpl::WriteLevel0TableForRecovery(int, rocksdb::ColumnFamilyData*, rocksdb::MemTable*, rocksdb::VersionEdit*)+0xb5f) [0x564bab686cdf] 9: (rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long*, bool)+0x1cc7) [0x564bab6890d7] 10: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0xb23) [0x564bab689f93] 11: (rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::string const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool, bool)+0x9f7) [0x564bab684be7] 12: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::string const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0x24) [0x564bab685eb4] 13: (RocksDBStore::do_open(std::ostream&, bool, bool, std::vector<KeyValueDB::ColumnFamily, std::allocator<KeyValueDB::ColumnFamily> > const*)+0xcf9) [0x564bab614689] 14: (BlueStore::_open_db(bool, bool, bool)+0xa41) [0x564bab429c61] 15: (BlueStore::_open_db_and_around(bool)+0x17e) [0x564bab43f6be] 16: (BlueStore::_mount(bool, bool)+0x5c2) [0x564bab482722] 17: (BlueStore::expand_devices(std::ostream&)+0x36) [0x564bab482bd6] 18: (main()+0x24a3) [0x564bab398903] 19: (__libc_start_main()+0xf5) [0x7fb3306923d5] 20: (()+0x1f59ef) [0x564bab3be9ef] obviously before expand take effect , _mount failed. Is there any other way to rescue this osds? I think this accident is likely happend in production Any suggestions would be greatly appreciated _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx