> I think the short answer is "because you have so wildly varying sizes > both for drives and hosts". Arguably OP's OSDs *are* balanced in that their PGs are roughly in line with their sizes, but indeed the size disparity is problematic in some ways. Notably, the 500GB OSD should just be removed. I think balancing doesn't account for WAL/DB/other overhead, so it won't be accurately accounted for and can't hold much data nyway. This cluster shows evidence of reweight-by-utilization having been run, but only on two of the hosts. If the balancer module is active, those override weights will confound it. > > If your drive sizes span from 0.5 to 9.5, there will naturally be > skewed data, and it is not a huge surprise that the automation has > some troubles getting it "good". When the balancer places a PG on a > 0.5-sized drive compared to a 9.5-sized one, it eats up 19x more of > the "free space" on the smaller one, so there are very few good > options when the sizes are so different. Even if you placed all PGs > correctly due to size, the 9.5-sized disk would end up getting 19x > more IO than the small drive and for hdd, it seldom is possible to > gracefully handle a 19-fold increase in IO, most of the time will > probably be spent on seeks. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx