Recommendations for busy static web server replacement

lslusser at gmail.com (Liam Slusser) · Tue, 7 Feb 2012 17:11:01 -0800

We run a similar setup here.  I use gluster 2.x to strip/replicate 8
30tb xfs bricks (24x1.5tb raid6) into a single 120tb array.  Each of
my objects is around 1-4megs in size, avg is around 2.  Here are a few
things we've done to get good performance from our setup.

a) We use XFS with mount options "noatime,nodiratime,inode64"

b) I use a hardware 3ware 9690/9750 raid controller with the latest
firmware and the battery backup option.  With such a large array as we
use we lose hard drives all the time so doing rebuilds in hardware is
great.  Make sure to run the latest 3ware firmware.  I've tried Dell
perc, a few different LSI cards, supermicro branded LSI cards, and a
few others and 3ware was by far the most reliable.  I wanted to use
cheap off the shelf desktop SATA drives so the controller has to be
smart about dealing with SATA timeouts.

c) Our gluster servers are Centos 5 or 6.  Disable smartd as it
interferes with the 3ware controller.  I set the blockdev
(/sbin/blockdev --setra 16384 /dev/sda) on each of the bricks - this
was a huge help!  Also make sure you run the latest drive available
from 3ware.

d) On the gluster side of things, we use a "raid10" type of setup.  We
replicate two sets of 4 bricks striped together (type =
cluster/distribute), so we have two complete copies of our data.  We
break this mirror on our public facing feed servers.   We have two
feed servers running apache with a custom in-house apache module to
handle the actual serving of data.  Each server only talks to one side
of gluster - so we intensionally break glusters replication on
feeding.  If one of our filers goes offline we have to disable that
feed server in our load balancer and then of course repair any data
that wasn't replicated with a "ls -alR".  We've found that disabling
gluster's replication on our feed side increased performance
dramatically because it wasn't having to do read-repairs checking.  Of
course the servers that are creating the content talk to both sides of
the cluster.  Make sure to have good monitoring (we use nagios with
lots of custom scripts).

e) I have a very small 2mb cache in our gluster clients.  We have such
a large volume/library that getting a cache hit almost never happens
so don't waste the memory.

f)  My apache module rewrites incoming URIs to load balance incoming
requests to two different gluster mounts on the filesystem.  Each
gluster mount is its own client talking to the same server over
different gigabit ethernet links to different glusterfsd daemons
running on different ports.  ie 192.168.1.50:6996 and
192.168.2.50:6997.  This not only doubled my bandwidth between my feed
clients and backend servers (which is good) it also reduces latency on
those links. I've also found that the linux thread scheduler does a
better job of distributing glusterfsd file requests with multiple
glusterfsd processes then a single gluster process and a higher thread
count.

g) My hardware is pretty simple.  As mentioned above, I use 3ware
hardware raid controllers.  Seagate 1.5tb and 2tb 7200rpm SATA desktop
drives.  Supermicro 4u 24 drive chassis with a builtin SAS expander
backplane.  The raid card connects to the backplane via 4 lane SAS
connector.  I use low end dual cpu quad-core xeon cpus and 8g of ram
(remember you can't cache anything).  Each head server has ~48 drives
(2 bricks).  My apache feed servers are currently two 5 year old Dell
1950 with 8g of ram.

An average day here I pushed on average about 1tb of data a day in
about 4 million requests or around 100-200mbit/sec.  However i've
sustained 6 times that with the same setup or around ~600-700mbits/sec
and nearly 20 million requests a day.

I haven't tried the newer version of gluster 3.x as everything just
sort of works for the most part on the 2.x code.  There are gotchas
and things that annoy me but for the most part everything works very
well.  I was able to replace my old Isilon storage for less then the
annual cost of the support contract and doubling the space in the
process!

liam

On Tue, Feb 7, 2012 at 11:26 AM, Brian Candler <B.Candler at pobox.com> wrote:
> On Tue, Feb 07, 2012 at 12:56:57PM -0500, John Mark Walker wrote:
>> Brian - thank you for sharing these configuration tips. I'd love to have that in a blog post :)
>>
>> As a close second, perhaps you could post a mini Q&A on community.gluster.org? This is the type of information that's very useful for google to index and make available.
>
> I'm just learning this stuff myself, accumulating tips from here and from
> the XFS mailing list :-)
>
> I'm doing internal documentation as I go, but I'll try to make a public
> summary too.
>
> Regards,
>
> Brian.
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users