Improving Gluster performance through more hardware.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all.

First, I have a specific question about what hardware should be used for Gluster, then after that I have a question about how Gluster does its multithreading/hyperthreading.

So, we have a new Gluster cluster (currently, two servers with one "replicated" volume) serving up our files for e-mail, which has for years been stored in Maildir format. That works pretty well except for the few clients who store all their old mail on our server, and their "cur" folder contains a few tens of thousands of messages. As others have noticed, this isn't something that Gluster handles well. But we value high availability and redundancy more than we value fast, and we don't yet have a large enough cluster to justify going with software the requires a metadata server. So we're going with Gluster as a result of this. That doesn't mean we don't need better performance though.

So I've noticed that the resources that Gluster consumes the most in our use case isn't the network or disk utilization - both of which remain *well* under full utilization - but CPU cycles. I can easily test this by running `ls -l` in a folder with ~20,000 files in it, and I see CPU usage by glusterfsd jump to between 40-200%. The glusterfs process usually stays around 20-30%.

Both of our Gluster servers are gen III Dell 2950's with dual Xeon E5345's (quad-core, 2.33 GHz CPUs) in them, so we have 8 CPUs total to deal with this load. So far, we're only using a single mail server, but we'll be migrating to a load-balanced pair very soon. So my guess is that we can reduce the latency that's very noticeable in our webmail by upgrading to the fastest CPUs the 2950's can hold, evidently a 3.67 GHz quad-core.

It would be nice to know what other users have experienced with this kind of upgrade, or whether they've gotten better performance from other hardware upgrades.

Which leads to my second question. Does glusterfsd spawn multiple threads to handle other requests made of it? I don't see any evidence of this in the `top` program, but other clients don't notice at all that I'm running up the CPU usage with my one `ls` process. Smaller mail accounts can read their mail just as quickly as if the system were at near-idle while this operation is in progress. It's also hard for me to test this with only one mail server attached to the Gluster cluster. I can't tell if the additional load from 20 or 100 other servers makes any difference to CPU usage, but we want to know about what performance we can expect should we expand that far, and whether throwing more CPUs at the problem is the answer, or just throwing faster CPUs at the problem is what we will need to do in the future.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux