Hi Igor, On 3/16/20 10:34 AM, Igor Fedotov wrote: > I can suggest the following non-straightforward way for now: > > 1) Check osd startup log for the following line: > > 2020-03-15 14:43:27.845 7f41bb6baa80 1 > bluestore(/var/lib/ceph/osd/ceph-681) _open_alloc loaded 23 GiB in 97 > extents > > Note 23GiB loaded. > > 2) Then retriever bluefs usage space for main device from > "bluefs-bdev-sizes" output: > > 1 : device size 0x1a80000000 : own > ... > > = 0x582550000 : using 0x56d090000(22 GiB) > > 3) Actual available space would be around: 1 GiB = 23GiB - 22 GiB Ok so essentially the data is then the delta of what it reports in the following line? Why the descrepency from the _open_alloc loaded information and the bluefs-bdev-sizes output? bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-681/block size 106 GiB >> [root@obj21 ceph-709]# ceph-bluestore-tool --log-level 30 --path >> /var/lib/ceph/osd/ceph-709 --command fsck >> 2020-03-16 08:02:16.590 7f5faaa11c00 -1 >> bluestore(/var/lib/ceph/osd/ceph-709) fsck error: bluefs_extents >> inconsistency, downgrade to previous releases might be broken. >> fsck found 1 error(s) >> >> [0] - ftp://ftp.umiacs.umd.edu/pub/derek/ceph-osd.709-fsck-deep > > fsck-deep suffers from the same lack of space. Could you please collect > log for regular fsck? That is the log for the regular fsck it only says that it founds 1 error(s) and doesn't give any further information (even with the --log-level 30 specified). > So looks like checksum errors appeared after the initial failure. And > trigger recovery which requires additional space... > > > I think the summary of the issue is as follows: > > Cluster had been in 'near full' state when some OSDs started to crash > due to lack of free space. > > An attempt to extend device can succeed or not depending on the state > RocksDB is in when the first crash happened. > > a) If DB is corrupted and needs for recovery (which is triggered on each > non read-only DB open) it asks for more space which fails again and OSD > fall into a "deadlock" state: > > to extend main device one needs DB access which in turn needs more space. > > b) If DB isn't corrupted expansion succeeds and OSD starts to get more > data due to peering. Which finally fills it up and OSD tend to get into a). > > Some OSDs will presumably allow another expansion though. > > > Unfortunately I don't know any fix/workaround for the "deadlock" case at > the moment. I am trying to find creative ways to allow to increase the space significantly on the OSD but not strand it so I can continue to provide new space to more of the OSDs. LVM is helpful here > Probably migrating DB to a standalone volume (using > ceph-bluestore-tool's bluefs-bdev-migrate commands) will help but I need > to double check that. > > And it will definitely expose data to risk of loss so please hold on > until my additional recommendations. > > Most probably you will need additional 30GB of free space per each OSD > if going this way. So please let me know if you can afford this. Well I had already increased 709's initial space from 106GB to 200GB and now I gave it 10GB more but it still can not actually resize. Here is the relevant information I think but the full logs is here[0]. I then did it with 30G (now total of 240G) and it still failed[1]. I am out of space without some additional hardware in this node though I have an idea. If I knew what size it is (and what space it needs for recovery this would be very helpful). # ceph-bluestore-tool --log-level 30 --path /var/lib/ceph/osd/ceph-709 --command bluefs-bdev-expand -4> 2020-03-16 11:33:34.181 7f41d5940c00 -1 bluestore(/var/lib/ceph/osd/ceph-709) allocate_bluefs_freespace failed to allocate on 0xb000000 min_size 0xb000000 > allocated total 0x80000 bluefs_shared_alloc_size 0x10000 allocated 0x80000 available 0x 8000 -3> 2020-03-16 11:33:34.181 7f41d5940c00 -1 bluefs _allocate failed to expand slow device to fit +0xaffa895 -2> 2020-03-16 11:33:34.181 7f41d5940c00 -1 bluefs _flush_range allocated: 0x0 offset: 0x0 length: 0xaffa895 -1> 2020-03-16 11:33:34.184 7f41d5940c00 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.6/rpm/el7/BUILD/ceph-14.2.6/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread 7f41d5940c00 time 2020-03-16 11:33:34.181884 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.6/rpm/el7/BUILD/ceph-14.2.6/src/os/bluestore/BlueFS.cc: 2269: ceph_abort_msg("bluefs enospc") [0] - ftp://ftp.umiacs.umd.edu/pub/derek/ceph-osd.709.bluefs-bdev-expand [1] - ftp://ftp.umiacs.umd.edu/pub/derek/ceph-osd.709.bluefs-bdev-expand-2 > >> Thanks, >> derek >> -- Derek T. Yarnell Director of Computing Facilities University of Maryland Institute for Advanced Computer Studies _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx