Hi Stephan,
it looks like you've faced the following bug:
https://tracker.ceph.com/issues/47883
To workaround it you might want to switch both bluestore and bluefs
allocators back to bitmap for now.
The fixes for Octopus/Nautilus are on their ways:
https://github.com/ceph/ceph/pull/38474
https://github.com/ceph/ceph/pull/38475
Hope this helps,
Igor
On 12/16/2020 5:51 PM, Stephan Austermühle wrote:
Hi all,
in various Rook operated Ceph clusters I have seen OSDs going into a
CrashLoop due to
debug 2020-12-16T13:19:25.500+0000 7fc4c3f13f40 4 rocksdb:
EVENT_LOG_v1 {"time_micros": 1608124765507105, "job": 1, "event":
"recovery_started", "log_files": [1400, 1402]}
debug 2020-12-16T13:19:25.500+0000 7fc4c3f13f40 4 rocksdb:
[db/db_impl_open.cc:583] Recovering log #1400 mode 0
debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40 1 bluefs _allocate
failed to allocate 0x43ce43d on bdev 1, free 0x2e50000; fallback to
bdev 2
debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40 1 bluefs _allocate
unable to allocate 0x43ce43d on bdev 2, free 0xffffffffffffffff;
fallback to slow device expander
debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40 -1
bluestore(/var/lib/ceph/osd/ceph-1) allocate_bluefs_freespace failed
to allocate on 0x3d1b0000 min_size 0x43d0000 > allocated total
0x300000 bluefs_shared_alloc_size 0x10000 allocated 0x300000 available
0x b019c8000
debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40 -1 bluefs _allocate
failed to expand slow device to fit +0x43ce43d
debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40 -1 bluefs _flush_range
allocated: 0x0 offset: 0x0 length: 0x43ce43d
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.6/rpm/el8/BUILD/ceph-15.2.6/src/os/bluestore/BlueFS.cc:
In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t,
uint64_t)' thread 7fc4c3f13f40 time 2020-12-16T13:19:27.731533+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.6/rpm/el8/BUILD/ceph-15.2.6/src/os/bluestore/BlueFS.cc:
2721: ceph_abort_msg("bluefs enospc")
The OSD is not really full:
# ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-1
inferring bluefs devices from bluestore path
1 : device size 0x18ffc00000 : own 0x[bffe10000~fffe0000] = 0xfffe0000
: using 0xfd190000(4.0 GiB) : bluestore has 0x82b360000(33 GiB) available
Expanding DB/WAL...
Expanding the underlying block device by just 1 gig followed by
"ceph-bluestore-tool bluefs-bdev-expand" and "ceph-bluestore-tool
repair" resolves the situation. In general, larger OSDs seem to reduce
the likeliness for this issue.
Ceph version is v15.2.6.
Is this a known bug?
Ceph report and logs are attached.
Thanks for your help
Stephan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx