Re: Uneven data placement

Andrey Korolyov <andrey@xxxxxxx> · Sun, 17 Mar 2013 20:25:29 +0400

On Sun, Mar 17, 2013 at 8:14 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
> On Sunday, March 17, 2013 at 9:09 AM, Andrey Korolyov wrote:
>> On Sun, Mar 17, 2013 at 7:56 PM, Gregory Farnum <greg@xxxxxxxxxxx (mailto:greg@xxxxxxxxxxx)> wrote:
>> > On Sunday, March 17, 2013 at 4:46 AM, Andrey Korolyov wrote:
>> > > Hi,
>> > >
>> > > from osd tree:
>> > >
>> > > -16 4.95 host 10.5.0.52
>> > > 32 1.9 osd.32 up 2
>> > > 33 1.05 osd.33 up 1
>> > > 34 1 osd.34 up 1
>> > > 35 1 osd.35 up 1
>> > >
>> > > df -h:
>> > > /dev/sdd3 3.7T 595G 3.1T 16% /var/lib/ceph/osd/32
>> > > /dev/sde3 3.7T 332G 3.4T 9% /var/lib/ceph/osd/33
>> > > /dev/sdf3 3.7T 322G 3.4T 9% /var/lib/ceph/osd/34
>> > > /dev/sdg3 3.7T 320G 3.4T 9% /var/lib/ceph/osd/35
>> > >
>> > > -10 2 host 10.5.0.32
>> > > 18 1 osd.18 up 1
>> > > 26 1 osd.26 up 1
>> > >
>> > > df -h:
>> > > /dev/sda2 926G 417G 510G 45% /var/lib/ceph/osd/18
>> > > /dev/sdb2 926G 431G 496G 47% /var/lib/ceph/osd/26
>> > >
>> > > Since osds on 10.5.0.32 does not contain garbage bytes almost for
>> > > sure, seems to be some weirdness in the placement. Crush rules are
>> > > almost default, there is no adjustment by node subsets. Any thoughts
>> > > will be appreciated!
>> >
>> >
>> > Do you have any other nodes? What's the rest of your osd tree look like?
>> >
>> > I do note that at a first glance, you've got 1569GB in 10.5.0.52 and 848 in 10.5.0.32, which is a 1.85 differential when you'd really like a ~2.5 differential (based on the very odd CRUSH weights you've assigned to each device, and the hosts). I suspect/hope you've also got something weird going on with the rest of your interior nodes (not pictured here), but perhaps not — and either way I'd recommend fixing up the rest of your weights and seeing if that improves the distribution.
>>
>> Nope, all other osds have weight one(and each host contains two osds,
>> this many-disk system is an experimental one). This host had round
>> values recently, I`ve just changed weights a bit to test a speed of
>> data rearrangement. Problem existed since 10.5.0.52 entered to the
>> data placement with default ``1'' osd weights.
>>
> So you had them all set to weight 1 for a while, despite the disks having very different sizes. That would give them very different utilization percentages (with the same absolute usage) like you've shown here and is expected behavior. Weight them according to size if you want them to fill up at the same rate.

Yes, but in my case absolute usage values are different too - that`s
why I though that something is not right.

> Also, when data's been migrating you might not see space reclaimed instantly — the OSDs put deleted stuff into a queue to erase and will do so as they've got time, while trying not to interrupt client I/O.

Ofc, I mind that in all my tests with the crushmap :) This delta
remained across a couple of weeks, lowering as absolute data commit is
growing up, but still unexplainably high. For example, 10.5.0.32 was
entered after this many-disk host and had filled up to same values as
any other two-osd machine in the cluster.

> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com