Re: Uneven data placement

Gregory Farnum <greg@xxxxxxxxxxx> · Sun, 17 Mar 2013 09:31:07 -0700



On Sunday, March 17, 2013 at 9:25 AM, Andrey Korolyov wrote:
> On Sun, Mar 17, 2013 at 8:14 PM, Gregory Farnum <greg@xxxxxxxxxxx (mailto:greg@xxxxxxxxxxx)> wrote:
> > On Sunday, March 17, 2013 at 9:09 AM, Andrey Korolyov wrote:
> > > On Sun, Mar 17, 2013 at 7:56 PM, Gregory Farnum <greg@xxxxxxxxxxx (mailto:greg@xxxxxxxxxxx)> wrote:
> > > > On Sunday, March 17, 2013 at 4:46 AM, Andrey Korolyov wrote:
> > > > > Hi,
> > > > >  
> > > > > from osd tree:
> > > > >  
> > > > > -16 4.95 host 10.5.0.52
> > > > > 32 1.9 osd.32 up 2
> > > > > 33 1.05 osd.33 up 1
> > > > > 34 1 osd.34 up 1
> > > > > 35 1 osd.35 up 1
> > > > >  
> > > > > df -h:
> > > > > /dev/sdd3 3.7T 595G 3.1T 16% /var/lib/ceph/osd/32
> > > > > /dev/sde3 3.7T 332G 3.4T 9% /var/lib/ceph/osd/33
> > > > > /dev/sdf3 3.7T 322G 3.4T 9% /var/lib/ceph/osd/34
> > > > > /dev/sdg3 3.7T 320G 3.4T 9% /var/lib/ceph/osd/35
> > > > >  
> > > > > -10 2 host 10.5.0.32
> > > > > 18 1 osd.18 up 1
> > > > > 26 1 osd.26 up 1
> > > > >  
> > > > > df -h:
> > > > > /dev/sda2 926G 417G 510G 45% /var/lib/ceph/osd/18
> > > > > /dev/sdb2 926G 431G 496G 47% /var/lib/ceph/osd/26
> > > > >  
> > > > > Since osds on 10.5.0.32 does not contain garbage bytes almost for
> > > > > sure, seems to be some weirdness in the placement. Crush rules are
> > > > > almost default, there is no adjustment by node subsets. Any thoughts
> > > > > will be appreciated!
> > > >  
> > > >  
> > > >  
> > > >  
> > > > Do you have any other nodes? What's the rest of your osd tree look like?
> > > >  
> > > > I do note that at a first glance, you've got 1569GB in 10.5.0.52 and 848 in 10.5.0.32, which is a 1.85 differential when you'd really like a ~2.5 differential (based on the very odd CRUSH weights you've assigned to each device, and the hosts). I suspect/hope you've also got something weird going on with the rest of your interior nodes (not pictured here), but perhaps not — and either way I'd recommend fixing up the rest of your weights and seeing if that improves the distribution.
> > >  
> > > Nope, all other osds have weight one(and each host contains two osds,
> > > this many-disk system is an experimental one). This host had round
> > > values recently, I`ve just changed weights a bit to test a speed of
> > > data rearrangement. Problem existed since 10.5.0.52 entered to the
> > > data placement with default ``1'' osd weights.
> >  
> >  
> > So you had them all set to weight 1 for a while, despite the disks having very different sizes. That would give them very different utilization percentages (with the same absolute usage) like you've shown here and is expected behavior. Weight them according to size if you want them to fill up at the same rate.
>  
>  
> Yes, but in my case absolute usage values are different too - that`s
> why I though that something is not right.
>  
With your current crush map they have to be — you've got a node with 2 disks totaling ~2TB and a weight of 2 compared to a node with 4 disks totaling ~15TB and a weight of ~5. That's not the right modifier to keep their absolute usages the same!
And of course it's all probabilities — your usages might be off by a bit and generally will converge as you add more data into the cluster.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com