Re: Diskless Shared-Root GFS/Cluster

Jayson Vantuyl <jvantuyl@xxxxxxxxxxxxxx> · Thu, 1 Feb 2007 05:57:46 -0600

We are talking about application servers.

One of the toughest things about clustering in general and GFS in particular is the failure scenarios.
When you have any sort of cluster issue, if your root is on a shared GFS, that GFS freezes in various ways until fencing happens.  The problem with this is that certain binaries that are on the same GFS may need to be used to recover.  How do you execute fence_apc to fence a failed node when it is on a GFS that is hung waiting on that same fencing operation?

There are ways around this involving RAM disks and the like, but eventually we just settled on having a minimal flash disk that would get us onto our SAN (but not clustered).  Only after we were on a non-clustered-FS on our SAN would we then start up our clustered filesystem.  This gave us the ability to move our nodes around easily.  This is an often overlooked benefit of a shared root that putting your root FS on SAN gives you as well.  There's nothing like booting up a dead node on spare hardware.  This also gives you a solid way to debug a damaged root system.  With shared-root it's all or nothing.  It's not so with this configuration.  You also have separate syslog files and other things that are one more special case on a shared root.  It's also easy to set up nodes with slightly different configurations (shared-root makes this another special case).  As for the danger of drive failure, a read-only IDE flash disk (Google for Transcend) is simple, easy, and dead solid.

After consolidating your shared configuration files into /etc/shared and placing appropriate symlinks into that directory, it is a simple matter of rsync / csync / tsync / cron+scp to keep them synchronized.

It is tempting to want to have a shared root to minimize management requirements.  It is tempting to want to play games with ramfs and the like to provide a support system that will function when that shared root is hung due to clustering issues.  It is tempting to think that having a shared GFS root is really useful.

However, if you value reliability and practicality, it's much easier to script up an occasional Rsync than it is to do so many acrobatics for such little gain.  For a cluster (and its apps) to be reliable at all, it needs to be able to function, recover, and generally have a stable operating environment.  Putting GFS under the userspace that drives it is asking for trouble.

On Jan 31, 2007, at 1:34 PM, isplist@xxxxxxxxxxxx wrote:

I'm thinking for application servers/cluster only, not workstation users.

On Wed, 31 Jan 2007 11:10:55 -0800, Tom Mornini wrote:
 We boot from flash drives, then pivot root to SAN storage.

I agree with no drives in servers, but shared root is a
whole different ball game if you mean everyone using a
single filesystem for root.

--
-- Tom Mornini, CTO
-- Engine Yard, Ruby on Rails Hosting
-- Reliability, Ease of Use, Scalability
-- (866) 518-YARD (9273)

 -- 
Jayson Vantuyl
Systems Architect
Engine Yard
jvantuyl@xxxxxxxxxxxxxx

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster