On 8/30/22 22:31, Wyll Ingersoll wrote:
One of our OSDs eventually reached 100% capacity (in spite of the full ratio being 95%). Now it is down and we cannot restart the osd process on it because there is not enough space on the device.
Is there a way to find PGs on that disk that can be safely removed without destroying data so we can bring it back online? This is a bluestore OSD.
You can export PGs and import them on another OSD. After successful
import you can delete the PG on the source OSD. See
https://docs.ceph.com/en/pacific/man/8/ceph-objectstore-tool/
export:
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-$source-id
--pgid $pgid --op export --file /path/to/export/drive/$pgid.dump
import:
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-$dest-id --op
import --file /path/to/export/drive/$pgid.dump
remove if successful:
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-$source-id
--pgid $pgid --op remove --force
I don't understand how this overfilling issue is not already a bug that is getting attention, it seems very broken that an OSD can blow way past its full_ratio.
Can you capture logs and cluster status? See this export tool,
ceph-collect, by 42on to get useful data:
https://github.com/42on/ceph-collect
Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx