While running a rebalance, what does the metric "layout" represent

jburnash at knight.com (Burnash, James) · Thu, 3 Mar 2011 11:13:38 -0500

Hi Joe.

Thank you - that's a nicely detailed explanation, and a sufficiently reasonable guess as to what the "layout" metric may mean.

At the end of the day, a single lane of SATA for each box sure does look like the ultimate bottleneck in this setup - we were aware of this when we built it, but total storage was judged to be more important than speed, so at least we're on spec.

Here is the dstat output from the two machine that are rebalancing - this is just about 30 seconds worth of output, but they're pretty constant in their load and numbers at this time:

[root at jc1letgfs13 vols]# dstat
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  0   1  99   0   0   0|2579k   32M|   0     0 |   0   0.3 |3532  4239
  1   3  95   1   0   2|1536k  258M| 276M 4135k|   0     0 |  10k   14k
  1   6  90   1   0   2|1304k  319M| 336M 4176k|   0     0 |  13k   16k
  1   3  95   0   0   1|1288k  198M| 199M 3497k|   0     0 |  11k   11k
  1   3  94   1   0   2|1288k  296M| 309M 4039k|   0     0 |  12k   15k
  1   2  95   1   0   1|1032k  231M| 221M 2297k|   0     0 |  11k   11k
  1   3  94   0   0   2|1296k  278M| 296M 4078k|   0     0 |  14k   15k
  1   7  89   1   0   2|1552k  374M| 386M 5849k|   0     0 |  15k   19k
  1   4  93   0   0   2|1024k  343M| 350M 2961k|   0     0 |  13k   17k
  1   4  92   1   0   2|1304k  370M| 383M 4499k|   0     0 |  14k   18k
  1   3  94   1   0   2| 784k  286M| 311M 5202k|   0     0 |  12k   15k
  1   3  93   1   0   2|1280k  312M| 319M 3109k|   0     0 |  12k   16k
  1   6  91   1   0   2|1296k  319M| 342M 4270k|   0     0 |  13k   16k

root at jc1letgfs16:~# dstat
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  1   2  84  12   0   1| 204M   84M|   0     0 |   0    19B|  28k   24k
  1   1  82  16   0   1| 231M 6920k|1441k  240M|   0     0 |  16k   17k
  1   2  79  18   0   1| 328M 1208k|2441k  338M|   0     0 |  18k   19k
  1   1  85  13   0   1| 268M  136k|2139k  280M|   0     0 |  15k   16k
  1   2  78  18   0   1| 370M  320k|2637k  383M|   0     0 |  19k   20k
  1   2  79  18   0   1| 290M  136k|2245k  306M|   0     0 |  16k   18k
  1   2  79  18   0   1| 318M  280k|1770k  325M|   0     0 |  17k   18k
  1   2  80  17   0   1| 277M  248k|2149k  292M|   0     0 |  15k   17k
  1   2  79  18   0   1| 313M  128k|2331k  328M|   0     0 |  17k   18k
  1   2  79  18   0   1| 323M  376k|2373k  336M|   0     0 |  18k   19k
  1   1  79  18   0   1| 267M  136k|2070k  275M|   0     0 |  15k   17k
  1   1  78  19   0   1| 275M  368k|1638k  289M|   0     0 |  16k   18k
  1   2  78  19   0   1| 337M 1480k|2450k  343M|   0     0 |  18k   20k
  2   3  74  20   0   1| 312M 1344k|2403k  330M|   0     0 |  17k   24k
  1   1  80  17   0   1| 263M  688k|2078k  275M|   0     0 |  16k   17k
  1   1  81  16   0   1| 292M  120k|1677k  304M|   0     0 |  16k   17k
  1   1  78  19   0   1| 264M 4264k|2118k  271M|   0     0 |  16k   19k

James Burnash, Unix Engineering

-----Original Message-----
From: Joe Landman [mailto:landman at scalableinformatics.com]
Sent: Thursday, March 03, 2011 11:01 AM
To: Burnash, James
Cc: gluster-users at gluster.org
Subject: Re: While running a rebalance, what does the metric "layout" represent

On 03/03/2011 10:49 AM, Burnash, James wrote:
> Sure, Joe  - I will get that to you. I do have dstat on the
> machines.
>
> What I'm really interested in, however is what the actual number
> following "layout" represents? Inodes? Blocks? Files? Any idea?

Probably a hash key index that needed updating.  Gluster uses hashes to
compute the physical layout (where the files live) with respect to
bricks.  If the calculation was checked, and found to be in error (this
is a guess) or in need of updating, I am betting they would update it
during this rebalance.

> In this case, there are two HP storage servers running external
> enclosures with a 1.5Gb link, filled with 70 SATA 2TB drives
> configured as RAID 50 and running over a active / passive bonded 10Gb
> network connection. All servers are local to the same switches -
> single hop between them.

Ok ... 1.5Gb link?  So this is like 1 lane of SAS 1?  I'd do ascii-art,
but I can't guarantee it would work well ... simple line art will do ...

[disks] --- (1.5Gb/s single link) --- [HP storage server]
                                               || (10GbE active/passive)
[disks] --- (1.5Gb/s single link) --- [HP storage server]

With the disks being RAID50 within the array.

> Each storage server hosts 10 bricks of 12TB each.

So you have 20 bricks total (set up as distribute+replicate I am
guessing).  I'd be curious as to how (if you did this), you can
guarantee mirrors don't show up as being on the same physical unit.

Are the files small/large (under 32kB / over 1MB) on the system on average?

> Does that help? It would be exceptionally cool if there was a
> calculator to help with all this ... I could do the math if I had to
> , but it's not ... my first language :-)

Yeah ... I am guessing you are running out of some resource somewhere
(probably maxing out on IOPs on reads, or possibly the network)

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com