Re: Architecture advice

Dan Parsons <dparsons@xxxxxxxx> · Thu, 8 Jan 2009 07:13:26 -0800

On Jan 8, 2009, at 5:23 AM, Joe Landman wrote:

Dan Parsons wrote:
Now that I'm upgrading to gluster 1.4/2.0, I'm going to take the time
and rearchitect things.
Hardware: Gluster servers: 4 blades connected via 4gbit fc to fast,
dedicated storage. Each server has two bonded Gig-E links to the rest
of my network, for 8gbit/s theoretical throughput.

Just make sure the channel bonded gigabits are a) not broadcom  
based, b) not using anything other than mode 0 (long story, but me  
offline if you want to hear some horror stories of hard to fix  
crashes).  If you have access to 10 GbE/IB, these would be superior  
solutions (entire system ... storage and clients).

I've actually been using this stripe + bonding solution for 8 months  
with no problems (that are related to those two technologies), even  
though I'm using broadcom chips (forced to by what's on our blades).  
Not only that, but the blade chassis switch is a Dell  
Powerconnect.......... I would have much preferred a Cisco but the  
replacement cost is kind of high and I did eventually make the  
powerconnect do the bonding properly. Using mode 802.3ad, which I  
believe is also known as mode 4. As I said, it's worked great for 8  
months. With iptraf I can saturate the 2gbit/s link with very little  
impact on system performance, and irq generation is reasonable too.  
I'd rather have Intel but I can't say the Broadcom stuff is causing  
any problems for me.

Gluster clients: 33 blades each with one, gig-e connection. They use
local storage for OS and gluster for input/output files.
Specific questions: (1) There are many times, in our workflow, when
more than a few nodes will want the same file at the same time. This
made me want to use the stripe xlator. In this way, when a client
node saturates its gig-e link reading the file, each gluster server
is using only 250mbit/s, leaving room for more clients. If I wasn't
using stripe, this hypothetical file would be on just one server
node, and it would get slammed if more than two client nodes talked
to it. Is there a better way of doing this? Did I make the correct
decision in using stripe xlator for this purpose? Can I achieve the
same thing using just afr?

Without spending money to fix the storage architecture, you really  
will need to look at afr, as stripe may help on single requests more  
than multiple (guessing).  You should be able to benchmark/test  
this, but I would imagine that AFR would help you with multiple  
simultaneous read/only access to specific files.  Read/write will be  
more complex.

If you can spend money to fix the storage architecture, 10GbE or IB  
everywhere (storage nodes, client nodes, ...).  You won't regret it.

stripe has worked flawlessly for me, helping enormously when 33 nodes  
each want the same pile of multi-gigabyte files at the same time. I  
asked the question I did to find out if there was a better way of  
doing this with gluster 2.0. The upgrade cost for IB was almost twice  
my department's annual budget. And I believe from what Krishna said,  
afr-based load balancing does not yet exist in 2.0. Again, no problems  
with this setup, just making sure I'm doing it the best way (with  
regards to gluster).

(2) I would like to architect the system such that if one node goes
down, the others can keep serving the data, even if overall
throughput is less. This means that all data would need to be
accessible from all clients. Is this something I would use afr xlator
for? If so, do I even need stripe anymore, to handle my need to have

Server side AFR.  Stripe may not help the reliability here.

ZOMG - please point me at docs on how to set this up. stripe on top of  
AFR sounds nice but I do believe that will make all my client nodes  
have to do more work, right? So having the servers handle AFR would be  
beautiful.

multiple servers capable of sending different chunks of the same
file? And how does the HA xlator play into this?
We have a mix of (small quantity of gigantic files) and (extremely
gigantic quantity of small files), so I'm sure there will need to be
some parameter tuning.
Thanks in advance. If this question would be better addressed under
some sort of support agreement, please let me know.
Dan Parsons
------------------------------------------------------------------------
_______________________________________________ Gluster-devel mailing
list Gluster-devel@xxxxxxxxxx http://lists.nongnu.org/mailman/listinfo/gluster-devel

Finally, the only remaining "issue" I have with gluster is the memory  
leak in the io-cache translator, which I'm told is fixed in 2.0, which  
is why I'm upgrading.

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman@xxxxxxxxxxxxxxxxxxxxxxx
web  : http://www.scalableinformatics.com
      http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615