Hi Joe. Thank you - that's a nicely detailed explanation, and a sufficiently reasonable guess as to what the "layout" metric may mean. At the end of the day, a single lane of SATA for each box sure does look like the ultimate bottleneck in this setup - we were aware of this when we built it, but total storage was judged to be more important than speed, so at least we're on spec. Here is the dstat output from the two machine that are rebalancing - this is just about 30 seconds worth of output, but they're pretty constant in their load and numbers at this time: [root at jc1letgfs13 vols]# dstat ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 0 1 99 0 0 0|2579k 32M| 0 0 | 0 0.3 |3532 4239 1 3 95 1 0 2|1536k 258M| 276M 4135k| 0 0 | 10k 14k 1 6 90 1 0 2|1304k 319M| 336M 4176k| 0 0 | 13k 16k 1 3 95 0 0 1|1288k 198M| 199M 3497k| 0 0 | 11k 11k 1 3 94 1 0 2|1288k 296M| 309M 4039k| 0 0 | 12k 15k 1 2 95 1 0 1|1032k 231M| 221M 2297k| 0 0 | 11k 11k 1 3 94 0 0 2|1296k 278M| 296M 4078k| 0 0 | 14k 15k 1 7 89 1 0 2|1552k 374M| 386M 5849k| 0 0 | 15k 19k 1 4 93 0 0 2|1024k 343M| 350M 2961k| 0 0 | 13k 17k 1 4 92 1 0 2|1304k 370M| 383M 4499k| 0 0 | 14k 18k 1 3 94 1 0 2| 784k 286M| 311M 5202k| 0 0 | 12k 15k 1 3 93 1 0 2|1280k 312M| 319M 3109k| 0 0 | 12k 16k 1 6 91 1 0 2|1296k 319M| 342M 4270k| 0 0 | 13k 16k root at jc1letgfs16:~# dstat ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 1 2 84 12 0 1| 204M 84M| 0 0 | 0 19B| 28k 24k 1 1 82 16 0 1| 231M 6920k|1441k 240M| 0 0 | 16k 17k 1 2 79 18 0 1| 328M 1208k|2441k 338M| 0 0 | 18k 19k 1 1 85 13 0 1| 268M 136k|2139k 280M| 0 0 | 15k 16k 1 2 78 18 0 1| 370M 320k|2637k 383M| 0 0 | 19k 20k 1 2 79 18 0 1| 290M 136k|2245k 306M| 0 0 | 16k 18k 1 2 79 18 0 1| 318M 280k|1770k 325M| 0 0 | 17k 18k 1 2 80 17 0 1| 277M 248k|2149k 292M| 0 0 | 15k 17k 1 2 79 18 0 1| 313M 128k|2331k 328M| 0 0 | 17k 18k 1 2 79 18 0 1| 323M 376k|2373k 336M| 0 0 | 18k 19k 1 1 79 18 0 1| 267M 136k|2070k 275M| 0 0 | 15k 17k 1 1 78 19 0 1| 275M 368k|1638k 289M| 0 0 | 16k 18k 1 2 78 19 0 1| 337M 1480k|2450k 343M| 0 0 | 18k 20k 2 3 74 20 0 1| 312M 1344k|2403k 330M| 0 0 | 17k 24k 1 1 80 17 0 1| 263M 688k|2078k 275M| 0 0 | 16k 17k 1 1 81 16 0 1| 292M 120k|1677k 304M| 0 0 | 16k 17k 1 1 78 19 0 1| 264M 4264k|2118k 271M| 0 0 | 16k 19k James Burnash, Unix Engineering -----Original Message----- From: Joe Landman [mailto:landman at scalableinformatics.com] Sent: Thursday, March 03, 2011 11:01 AM To: Burnash, James Cc: gluster-users at gluster.org Subject: Re: While running a rebalance, what does the metric "layout" represent On 03/03/2011 10:49 AM, Burnash, James wrote: > Sure, Joe - I will get that to you. I do have dstat on the > machines. > > What I'm really interested in, however is what the actual number > following "layout" represents? Inodes? Blocks? Files? Any idea? Probably a hash key index that needed updating. Gluster uses hashes to compute the physical layout (where the files live) with respect to bricks. If the calculation was checked, and found to be in error (this is a guess) or in need of updating, I am betting they would update it during this rebalance. > In this case, there are two HP storage servers running external > enclosures with a 1.5Gb link, filled with 70 SATA 2TB drives > configured as RAID 50 and running over a active / passive bonded 10Gb > network connection. All servers are local to the same switches - > single hop between them. Ok ... 1.5Gb link? So this is like 1 lane of SAS 1? I'd do ascii-art, but I can't guarantee it would work well ... simple line art will do ... [disks] --- (1.5Gb/s single link) --- [HP storage server] || (10GbE active/passive) [disks] --- (1.5Gb/s single link) --- [HP storage server] With the disks being RAID50 within the array. > Each storage server hosts 10 bricks of 12TB each. So you have 20 bricks total (set up as distribute+replicate I am guessing). I'd be curious as to how (if you did this), you can guarantee mirrors don't show up as being on the same physical unit. Are the files small/large (under 32kB / over 1MB) on the system on average? > Does that help? It would be exceptionally cool if there was a > calculator to help with all this ... I could do the math if I had to > , but it's not ... my first language :-) Yeah ... I am guessing you are running out of some resource somewhere (probably maxing out on IOPs on reads, or possibly the network) -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 DISCLAIMER: This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com