We run a similar setup here. I use gluster 2.x to strip/replicate 8 30tb xfs bricks (24x1.5tb raid6) into a single 120tb array. Each of my objects is around 1-4megs in size, avg is around 2. Here are a few things we've done to get good performance from our setup. a) We use XFS with mount options "noatime,nodiratime,inode64" b) I use a hardware 3ware 9690/9750 raid controller with the latest firmware and the battery backup option. With such a large array as we use we lose hard drives all the time so doing rebuilds in hardware is great. Make sure to run the latest 3ware firmware. I've tried Dell perc, a few different LSI cards, supermicro branded LSI cards, and a few others and 3ware was by far the most reliable. I wanted to use cheap off the shelf desktop SATA drives so the controller has to be smart about dealing with SATA timeouts. c) Our gluster servers are Centos 5 or 6. Disable smartd as it interferes with the 3ware controller. I set the blockdev (/sbin/blockdev --setra 16384 /dev/sda) on each of the bricks - this was a huge help! Also make sure you run the latest drive available from 3ware. d) On the gluster side of things, we use a "raid10" type of setup. We replicate two sets of 4 bricks striped together (type = cluster/distribute), so we have two complete copies of our data. We break this mirror on our public facing feed servers. We have two feed servers running apache with a custom in-house apache module to handle the actual serving of data. Each server only talks to one side of gluster - so we intensionally break glusters replication on feeding. If one of our filers goes offline we have to disable that feed server in our load balancer and then of course repair any data that wasn't replicated with a "ls -alR". We've found that disabling gluster's replication on our feed side increased performance dramatically because it wasn't having to do read-repairs checking. Of course the servers that are creating the content talk to both sides of the cluster. Make sure to have good monitoring (we use nagios with lots of custom scripts). e) I have a very small 2mb cache in our gluster clients. We have such a large volume/library that getting a cache hit almost never happens so don't waste the memory. f) My apache module rewrites incoming URIs to load balance incoming requests to two different gluster mounts on the filesystem. Each gluster mount is its own client talking to the same server over different gigabit ethernet links to different glusterfsd daemons running on different ports. ie 192.168.1.50:6996 and 192.168.2.50:6997. This not only doubled my bandwidth between my feed clients and backend servers (which is good) it also reduces latency on those links. I've also found that the linux thread scheduler does a better job of distributing glusterfsd file requests with multiple glusterfsd processes then a single gluster process and a higher thread count. g) My hardware is pretty simple. As mentioned above, I use 3ware hardware raid controllers. Seagate 1.5tb and 2tb 7200rpm SATA desktop drives. Supermicro 4u 24 drive chassis with a builtin SAS expander backplane. The raid card connects to the backplane via 4 lane SAS connector. I use low end dual cpu quad-core xeon cpus and 8g of ram (remember you can't cache anything). Each head server has ~48 drives (2 bricks). My apache feed servers are currently two 5 year old Dell 1950 with 8g of ram. An average day here I pushed on average about 1tb of data a day in about 4 million requests or around 100-200mbit/sec. However i've sustained 6 times that with the same setup or around ~600-700mbits/sec and nearly 20 million requests a day. I haven't tried the newer version of gluster 3.x as everything just sort of works for the most part on the 2.x code. There are gotchas and things that annoy me but for the most part everything works very well. I was able to replace my old Isilon storage for less then the annual cost of the support contract and doubling the space in the process! liam On Tue, Feb 7, 2012 at 11:26 AM, Brian Candler <B.Candler at pobox.com> wrote: > On Tue, Feb 07, 2012 at 12:56:57PM -0500, John Mark Walker wrote: >> Brian - thank you for sharing these configuration tips. I'd love to have that in a blog post :) >> >> As a close second, perhaps you could post a mini Q&A on community.gluster.org? This is the type of information that's very useful for google to index and make available. > > I'm just learning this stuff myself, accumulating tips from here and from > the XFS mailing list :-) > > I'm doing internal documentation as I go, but I'll try to make a public > summary too. > > Regards, > > Brian. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users