Re: allocate_bluefs_freespace failed to allocate / ceph_abort_msg("bluefs enospc")

Igor Fedotov <ifedotov@xxxxxxx> · Wed, 16 Dec 2020 19:04:08 +0300

Hi Stephan,

 it looks like you've faced the following bug: 
https://tracker.ceph.com/issues/47883

To workaround it you might want to switch both bluestore and bluefs 
allocators back to bitmap for now.

The fixes for Octopus/Nautilus are on their ways:

https://github.com/ceph/ceph/pull/38474

https://github.com/ceph/ceph/pull/38475

Hope this helps,

Igor

On 12/16/2020 5:51 PM, Stephan Austermühle wrote:
Hi all,

in various Rook operated Ceph clusters I have seen OSDs going into a 
CrashLoop due to

debug 2020-12-16T13:19:25.500+0000 7fc4c3f13f40  4 rocksdb: 
EVENT_LOG_v1 {"time_micros": 1608124765507105, "job": 1, "event": 
"recovery_started", "log_files": [1400, 1402]}
debug 2020-12-16T13:19:25.500+0000 7fc4c3f13f40  4 rocksdb: 
[db/db_impl_open.cc:583] Recovering log #1400 mode 0
debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40  1 bluefs _allocate 
failed to allocate 0x43ce43d on bdev 1, free 0x2e50000; fallback to 
bdev 2
debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40  1 bluefs _allocate 
unable to allocate 0x43ce43d on bdev 2, free 0xffffffffffffffff; 
fallback to slow device expander
debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40 -1 
bluestore(/var/lib/ceph/osd/ceph-1) allocate_bluefs_freespace failed 
to allocate on 0x3d1b0000 min_size 0x43d0000 > allocated total 
0x300000 bluefs_shared_alloc_size 0x10000 allocated 0x300000 available 
0x b019c8000
debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40 -1 bluefs _allocate 
failed to expand slow device to fit +0x43ce43d
debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40 -1 bluefs _flush_range 
allocated: 0x0 offset: 0x0 length: 0x43ce43d
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.6/rpm/el8/BUILD/ceph-15.2.6/src/os/bluestore/BlueFS.cc: 
In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, 
uint64_t)' thread 7fc4c3f13f40 time 2020-12-16T13:19:27.731533+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.6/rpm/el8/BUILD/ceph-15.2.6/src/os/bluestore/BlueFS.cc: 
2721: ceph_abort_msg("bluefs enospc")

The OSD is not really full:

# ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-1
inferring bluefs devices from bluestore path
1 : device size 0x18ffc00000 : own 0x[bffe10000~fffe0000] = 0xfffe0000 
: using 0xfd190000(4.0 GiB) : bluestore has 0x82b360000(33 GiB) available
Expanding DB/WAL...

Expanding the underlying block device by just 1 gig followed by 
"ceph-bluestore-tool bluefs-bdev-expand" and "ceph-bluestore-tool 
repair" resolves the situation. In general, larger OSDs seem to reduce 
the likeliness for this issue.

Ceph version is v15.2.6.

Is this a known bug?

Ceph report and logs are attached.

Thanks for your help

Stephan

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx