Re: Getting placement groups to place evenly (again)

Gregory Farnum <greg@xxxxxxxxxxx> · Thu, 16 Apr 2015 17:02:05 -0700

On Sat, Apr 11, 2015 at 12:11 PM, J David <j.david.lists@xxxxxxxxx> wrote:
> On Thu, Apr 9, 2015 at 7:20 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>> Okay, but 118/85 = 1.38. You say you're seeing variance from 53%
>> utilization to 96%, and 53%*1.38 = 73.5%, which is *way* off your
>> numbers.
>
> 53% to 96% is with all weights set to default (i.e. disk size) and all
> reweights set to 1.  (I.e. before reweight-by-utilization and many
> hours of hand-tuning).

Ah, I see.

>
>> But it might just be faster to look for
>> anomalies within the size of important bits on the OSD — leveldb
>> stores, etc that don't correspond to the PG count).
>
> That would only work if I understood what you said and knew how to do it. :)

The OSD backing store sits on a regular filesystem. There are
directories within it for each PG, as well as for things like the
LevelDB instance embedded in each OSD.
If you're just getting unlucky with the big PGs ending up on OSDs
which already have too many PGs, then there's a CRUSH balancing
problem and you may be out of luck. But if, say, the LevelDB store is
just bigger on some OSDs than others for no particular reason, you
could maybe do something about that.

Since I now realize you did a bunch of reweighting to try and make
data match up I don't think you'll find something like badly-sized
LevelDB instances, though.

Final possibility which I guess hasn't been called out here is to make
sure that your CRUSH map is good and actually expected to place things
evenly. Can you share it?
Since you've got 38 OSDs and 8 nodes some of the hosts are clearly
different sizes; is there any correlation between which size the node
is and how full its OSDs are?
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com