I am using mimic full bluestore cluster with pure RGW
workload. We use AWS i3 instance family for osd machines -
each instance has 1 NVMe disk which is split into 4 partitions
and each of those partitions is devoted to bluestore block
device. We use 1 device per partition - so everything is
managed by bluestore internally.
The problem is that under write heavy conditions DB device
is growing fast and at some point bluefs will stop getting
more space which results in osd death. There is no recovery
from this error - when bluefs runs out of space for rocksdb,
osd dies and it cannot be restarted.
With this particular osd there is plenty of free space but
we can see that it cannot allocate more space under weird
address '_balance_bluefs_freespace no allocate on
0x80000000'.
2018-08-13 18:15:10.960 7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000
2018-08-13 18:15:11.330 7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000
2018-08-13 18:15:11.752 7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000
2018-08-13 18:15:11.785 7f6a5b882700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/rocksdb
/db/compaction_job.cc:1166] [default] [JOB 41] Generated table #14590: 304401 keys, 68804532 bytes
2018-08-13 18:15:11.785 7f6a5b882700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1534184111786253, "cf_name": "default", "job": 41, "event": "table_file_creation", "file_number": 14590, "file_size": 68804532, "table_properties": {"data_size
": 67112437, "index_size": 777792, "filter_size": 913252, "raw_key_size": 13383306, "raw_average_key_size": 43, "raw_value_size": 58673606, "raw_average_value_size": 192, "num_data_blocks": 17090, "num_entries": 304401, "filter_policy_na
me": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "0", "kMergeOperands": "0"}}
2018-08-13 18:15:12.245 7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000
2018-08-13 18:15:12.664 7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000
2018-08-13 18:15:12.743 7f6a5b882700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/rocksdb
/db/compaction_job.cc:1166] [default] [JOB 41] Generated table #14591: 313351 keys, 68830515 bytes
2018-08-13 18:15:12.743 7f6a5b882700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1534184112744129, "cf_name": "default", "job": 41, "event": "table_file_creation", "file_number": 14591, "file_size": 68830515, "table_properties": {"data_size
": 67109446, "index_size": 785852, "filter_size": 934166, "raw_key_size": 13762246, "raw_average_key_size": 43, "raw_value_size": 58469928, "raw_average_value_size": 186, "num_data_blocks": 17124, "num_entries": 313351, "filter_policy_na
me": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "0", "kMergeOperands": "0"}}
2018-08-13 18:15:13.025 7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000
2018-08-13 18:15:13.405 7f6a5b882700 1 bluefs _allocate failed to allocate 0x4200000 on bdev 1, free 0x3500000; fallback to bdev 2
2018-08-13 18:15:13.405 7f6a5b882700 -1 bluefs _allocate failed to allocate 0x4200000 on bdev 2, dne
2018-08-13 18:15:13.405 7f6a5b882700 -1 bluefs _flush_range allocated: 0x0 offset: 0x0 length: 0x419db1f
2018-08-13 18:15:13.405 7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000
2018-08-13 18:15:13.409 7f6a5b882700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/os/bluestore/Blue
FS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread 7f6a5b882700 time 2018-08-13 18:15:13.406645
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/os/bluestore/BlueFS.cc: 1663: FAILED assert(0 == "bluefs
enospc")
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xff) [0x7f6a6b660e1f]
2: (()+0x284fe7) [0x7f6a6b660fe7]
3: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned long)+0x1ac6) [0x55f6c6db9146]
4: (BlueRocksWritableFile::Flush()+0x3d) [0x55f6c6dcf0cd]
5: (rocksdb::WritableFileWriter::Flush()+0x196) [0x55f6c6faf7c6]
6: (rocksdb::WritableFileWriter::Sync(bool)+0x2e) [0x55f6c6fafa8e]
7: (rocksdb::CompactionJob::FinishCompactionOutputFile(rocksdb::Status const&, rocksdb::CompactionJob::SubcompactionState*, rocksdb::RangeDelAggregator*, CompactionIterationStats*, rocksdb::Slice const*)+0x73b) [0x55f6c6fed26b]
8: (rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)+0x77f) [0x55f6c6feff3f]
9: (rocksdb::CompactionJob::Run()+0x2c8) [0x55f6c6ff1508]
10: (rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::DBImpl::PrepickedCompaction*)+0xab4) [0x55f6c6e57da4]
11: (rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority)+0xd0) [0x55f6c6e59680]
12: (rocksdb::DBImpl::BGWorkCompaction(void*)+0x3a) [0x55f6c6e59b6a]
13: (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long)+0x266) [0x55f6c7034536]
14: (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x4f) [0x55f6c70346bf]
15: (()+0x6ae17f) [0x7f6a6ba8a17f]
16: (()+0x7e25) [0x7f6a681c5e25]
17: (clone()+0x6d) [0x7f6a672b5bad]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Has anyone stumbled upon similar problem? It looks like a bug to me - happened on several OSDs already, always different size of bluefs, different saturation of osd.
Best Regards, Kuba Stańczak