I am working on using GFS for home directories, smtp, imap, pop and web. So far things are going pretty good. (see my posting from this morning for one down side) At some point we may add Samba and other services. The current goal is to move mail and web servers to a set of inexpensive, highly available, and scalable servers.I'm just looking for a bit of general advice about GFS... We're basically just looking to use it as a SAN-based replacement for NFS. We've got a handful of servers that need constant read/write access to our users' home directories (Samba PDC/BDC, web server, network terminal servers, etc.), and we thought GFS might be a good replacement from a performance and security standpoint, let alone removing the SPOF of our main NFS/file server. Another place we're thinking of using it is underneath our mail servers, so that as we grow, SMTP deliveries (and virus scanning) can happen on one machine while IMAP/POP connections can be served through another.
Be carefull with performance, GFS adds a lot of overhead and can be very slow when dealing with lots of small file creates or deletes. After I did a lot of testing, we decided that the overhead was not a problem for our application.
We hacked up some crude tools to push test mail messages and a thing to read and verify the messages. At one point the cluster was running 600 imap connections and had nearly 1500 inbound mail messages/minute. The simulated users were reading a message every 10 seconds, after reading the mail box, would go back and delete a message every 2 seconds then expunge the mail box. There were 30 mail readers running on 20 linux workstations. The mail push consisted of 10 process on 5 workstations that would generate a random message then send it by SMTP to cluster. Without any delay, the push would send around 3300 messages/minute and swamp the cluster. After some trial and error, I found that a delay of 1.7 seconds between messages would slow the push rate to around 1400 messages/minute. The big bottle neck is openldap. Its load goes up serving sendmail aliases. I expect our first upgrade will be to put in more LDAP servers to spread the load.
I used the source RPM also, for the same reasons. I would love to be able to purchase support, but the funding is not there. I would suggest that you test carefully, and move slowly.Unfortunately, even at academic prices, Red Hat wants more per single GFS node than I'm paying for twenty AS licenses, so I've been heading down this road by building from the SRPMS. I mostly have a 2-node test cluster built under RHEL4, but a number of things have me a little bit hesitant to move forward, so I'm wondering if some folks can offer some advice.
I think this debate is ongoing for any body that is looking at a SAN or cluster. Once I factored in GFS, LDAP, Kerberos, load balancers, a SAN, etc, this has turned into the most complex system I have ever built for an employer. The 2 times we have had problems, one of the other servers took over and the traffic went through. We are still in test mode, but expect to put our first cluster in production on January 7th.For starters, is my intended use even appropriate for GFS? It does seem as though I'm looking to put an awful lot of overhead (with the cluster management suite) onto these boxes just to eliminate a SPOF.
I have been working with GFS for over a year now, on both test and soon-to-be production servers. In general, I think GFS works well.Another concern is that this list seems to have a lot more questions posted than answers. Are folks running into situations where filesystems are hopelessly corrupted or that they've been unable to recover from? That's the impression I feel like I'm getting, but I suppose a newbie to Linux in general could get the same impression from reading the fedora lists out of context. The last thing I want to do is put something into production and then have unexplained fencing occurences or filesystem errors.
EXT3, JFS, etc are more stable. The down side is that if the server that holds the EXT3 is down, then the applications are down. It is nice to be able to take a server out of production, fix it, make changes, etc and the users not know.
While testing, I have built on Fedora core 3, Suse 9.1, and RedHat Enterprise 3. I have built from the CVS tree, and SRPMS, and never had much trouble getting GFS up and running. I did have to write a fencing module to work with a Cisco switch (not difficult, it is just a perl script that does some SNMP calls). The SAN is ISCSI based and the only place to fence was in the switch. Of course, I have been hacking Unix boxes for 20 years now and using Linux in development and production for 10 years. If you background is Windows or VMS, you would have to work at it.Finally, Red Hat sales is laying it on pretty heavy that the reason the GFS pricing is so high is because it's nearly impossible to install it yourself. That was particularly true before GFS landed in Fedora. Now the claim is just that it's very difficult to manage without a support contract. Is this just marketing, or does GFS really turn out to be a nightmare to maintain?
Move slow, plan carefully, and test everything. Of course, this is my standard advice any time you are doing something new.Any insights people could provide would be appreciated.
Matt
mbrookov@xxxxxxxxx
-- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster