Re: allocate_bluefs_freespace failed to allocate / ceph_abort_msg("bluefs enospc")

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Stephan,

 it looks like you've faced the following bug: https://tracker.ceph.com/issues/47883

To workaround it you might want to switch both bluestore and bluefs allocators back to bitmap for now.

The fixes for Octopus/Nautilus are on their ways:

https://github.com/ceph/ceph/pull/38474

https://github.com/ceph/ceph/pull/38475


Hope this helps,

Igor



On 12/16/2020 5:51 PM, Stephan Austermühle wrote:
Hi all,

in various Rook operated Ceph clusters I have seen OSDs going into a CrashLoop due to

debug 2020-12-16T13:19:25.500+0000 7fc4c3f13f40  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1608124765507105, "job": 1, "event": "recovery_started", "log_files": [1400, 1402]} debug 2020-12-16T13:19:25.500+0000 7fc4c3f13f40  4 rocksdb: [db/db_impl_open.cc:583] Recovering log #1400 mode 0 debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40  1 bluefs _allocate failed to allocate 0x43ce43d on bdev 1, free 0x2e50000; fallback to bdev 2 debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40  1 bluefs _allocate unable to allocate 0x43ce43d on bdev 2, free 0xffffffffffffffff; fallback to slow device expander debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40 -1 bluestore(/var/lib/ceph/osd/ceph-1) allocate_bluefs_freespace failed to allocate on 0x3d1b0000 min_size 0x43d0000 > allocated total 0x300000 bluefs_shared_alloc_size 0x10000 allocated 0x300000 available 0x b019c8000 debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40 -1 bluefs _allocate failed to expand slow device to fit +0x43ce43d debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40 -1 bluefs _flush_range allocated: 0x0 offset: 0x0 length: 0x43ce43d /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.6/rpm/el8/BUILD/ceph-15.2.6/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread 7fc4c3f13f40 time 2020-12-16T13:19:27.731533+0000 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.6/rpm/el8/BUILD/ceph-15.2.6/src/os/bluestore/BlueFS.cc: 2721: ceph_abort_msg("bluefs enospc")

The OSD is not really full:

# ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-1
inferring bluefs devices from bluestore path
1 : device size 0x18ffc00000 : own 0x[bffe10000~fffe0000] = 0xfffe0000 : using 0xfd190000(4.0 GiB) : bluestore has 0x82b360000(33 GiB) available
Expanding DB/WAL...

Expanding the underlying block device by just 1 gig followed by "ceph-bluestore-tool bluefs-bdev-expand" and "ceph-bluestore-tool repair" resolves the situation. In general, larger OSDs seem to reduce the likeliness for this issue.

Ceph version is v15.2.6.

Is this a known bug?

Ceph report and logs are attached.

Thanks for your help

Stephan

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux