On Sat, Apr 11, 2015 at 12:11 PM, J David <j.david.lists@xxxxxxxxx> wrote: > On Thu, Apr 9, 2015 at 7:20 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> Okay, but 118/85 = 1.38. You say you're seeing variance from 53% >> utilization to 96%, and 53%*1.38 = 73.5%, which is *way* off your >> numbers. > > 53% to 96% is with all weights set to default (i.e. disk size) and all > reweights set to 1. (I.e. before reweight-by-utilization and many > hours of hand-tuning). Ah, I see. > >> But it might just be faster to look for >> anomalies within the size of important bits on the OSD — leveldb >> stores, etc that don't correspond to the PG count). > > That would only work if I understood what you said and knew how to do it. :) The OSD backing store sits on a regular filesystem. There are directories within it for each PG, as well as for things like the LevelDB instance embedded in each OSD. If you're just getting unlucky with the big PGs ending up on OSDs which already have too many PGs, then there's a CRUSH balancing problem and you may be out of luck. But if, say, the LevelDB store is just bigger on some OSDs than others for no particular reason, you could maybe do something about that. Since I now realize you did a bunch of reweighting to try and make data match up I don't think you'll find something like badly-sized LevelDB instances, though. Final possibility which I guess hasn't been called out here is to make sure that your CRUSH map is good and actually expected to place things evenly. Can you share it? Since you've got 38 OSDs and 8 nodes some of the hosts are clearly different sizes; is there any correlation between which size the node is and how full its OSDs are? -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com