Re: CRUSH tunables for production system? / Data Distribution?

Gregory Farnum <greg@xxxxxxxxxxx> · Wed, 13 Nov 2013 13:03:52 -0800



Ah, the CRUSH tunables basically don't impact placement at all unless
CRUSH fails to do a placement for some reason. What you're seeing here
is the result of a pseudo-random imbalance. Increasing your PG and
pgp_num counts on the data pool should resolve it (though at the cost
of some data movement which you'll need to be prepared for).
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Wed, Nov 13, 2013 at 1:00 PM, Oliver Schulz <oschulz@xxxxxxxxxx> wrote:
> Dear Greg,
>
>> I believe 3.8 is after CRUSH_TUNABLES v1 was implemented in the
>> kernel, so it shouldn't hurt you to turn them on if you need them.
>> (And the crush tool is just out of date; we should update that text!)
>> However, if you aren't having distribution issues on your cluster I
>> wouldn't bother [...]
>
>
> that's just the thing:
>
> Our cluster is now about 75% full and "ceph status" shows:
> "HEALTH_WARN 1 near full osd(s)".
>
> The used space on the (identical) OSD partitions varies between the
> extremes of 64% and 86% - I would have expected CRUSH to produce a
> more balanced data placement. Is this to be expected?
>
> Our cluster structure: 6 Nodes (in 3 crushmap "racks" with 2 nodes each), 6x
> 3TB disks per node, one OSD per disk - so 36 OSDs with 108 TB
> in total. Nodes and drives are all identical and were taken into
> operation at the same time, cluster hasn't changed since installation.
> Disks have one big OSD data partition only, system and OSD journals are
> on separate SSDs. Each OSD has a weight of 3.0 in the crushmap.
>
> We have the standard three data pools, set to 3x replication, plus a
> (so far unused) pool "cheapdata" with 2x replication. Each pool has
> 2368 PGs. Almost all of the data is in the data pool, then some in
> rdb and a little in metadata (cheapdata being empty for now).
>
> When I look at the used space on the /var/lib/ceph/osd/ceph-XX
> partitions on the nodes, I get the following:
>
> Node 1: 75%, 76%, 80%, 86%, 67%, 73%
> Node 2: 71%, 75%, 76%, 82%, 74%, 76%
> Node 3: 71%, 76%, 75%, 70%, 75%, 70%
> Node 4: 76%, 83%, 66%, 68%, 72%, 78%
> Node 5: 80%, 70%, 78%, 71%, 72%, 77%
> Node 6: 81%, 74%, 69%, 67%, 78%, 64%
>
> Is this normal, or might there be an issue with our configuration
> (no special things in it, though)? Might the tuning options help?
>
> I'd be very grateful for any advice! :-)
>
>
> Cheers,
>
> Oliver
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com