osd daemon perf dump for a one of my bluestore NVMe OSDs has [1] this excerpt. I grabbed those stats based on Wido's [2] script to determine how much DB overhead you have per object. My [3] calculations for this particular OSD are staggering. 99% of the space used on this OSD is being consumed by the DB. This particular OSD is sitting between 90%-97% disk usage with an occasional drop to 80%, but then back up. It's fluctuating wildly from one minute to the next.
One of my filestore NVMe OSDs in the same cluster has 99% of its used space in ./current/omap/
This is causing IO stalls as well as OSDs flapping on the cluster. Does anyone have any ideas of anything I can try? It's definitely not the actual PGs on the OSDs. I tried balancing the weights of the OSDs to better distribute the data, but moving the PGs around seemed to make things worse. Thank you.
[1] "bluestore_onodes": 167,
"stat_bytes_used": 143855271936,
"db_used_bytes": 142656667648,
[3] Average object size = 821MB
DB overhead per object = 814MB
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com