On Thursday 01 February 2007 12:57, Jayson Vantuyl wrote: > We are talking about application servers. > > One of the toughest things about clustering in general and GFS in > particular is the failure scenarios. > > When you have any sort of cluster issue, if your root is on a shared > GFS, that GFS freezes in various ways until fencing happens. The > problem with this is that certain binaries that are on the same GFS > may need to be used to recover. How do you execute fence_apc to > fence a failed node when it is on a GFS that is hung waiting on that > same fencing operation? We move that fencing, ccsd functionality into a special chroot that is rebuilt at any time you boot the server. This might be on a tmpfs - which is the case if the path you specified for the chroot is identified as GFS - and stays untouched if it is a local FS. Many customers are using lokal disks but not for booting or any valueable data just for temporary files and swap. So that a server is only an independent exchangeable box of metal. > > There are ways around this involving RAM disks and the like, but > eventually we just settled on having a minimal flash disk that would > get us onto our SAN (but not clustered). Only after we were on a non- > clustered-FS on our SAN would we then start up our clustered > filesystem. This gave us the ability to move our nodes around > easily. This is an often overlooked benefit of a shared root that > putting your root FS on SAN gives you as well. There's nothing like > booting up a dead node on spare hardware. This also gives you a > solid way to debug a damaged root system. With shared-root it's all > or nothing. It's not so with this configuration. You also have > separate syslog files and other things that are one more special case > on a shared root. It's also easy to set up nodes with slightly > different configurations (shared-root makes this another special > case). As for the danger of drive failure, a read-only IDE flash > disk (Google for Transcend) is simple, easy, and dead solid. You can also boot nodes with different hw configurations. The initrd in the open sharedroot does the hw detection. > > After consolidating your shared configuration files into /etc/shared > and placing appropriate symlinks into that directory, it is a simple > matter of rsync / csync / tsync / cron+scp to keep them synchronized. That's a question of architecture not technology. Where do you want to have your complexity? In the FS or userspace? > > It is tempting to want to have a shared root to minimize management > requirements. It is tempting to want to play games with ramfs and > the like to provide a support system that will function when that > shared root is hung due to clustering issues. It is tempting to > think that having a shared GFS root is really useful. > > However, if you value reliability and practicality, it's much easier > to script up an occasional Rsync than it is to do so many acrobatics > for such little gain. For a cluster (and its apps) to be reliable at > all, it needs to be able to function, recover, and generally have a > stable operating environment. Putting GFS under the userspace that > drives it is asking for trouble. You should really have a deeper look into sharedroot concepts . You'll like it! Regards Marc. > > On Jan 31, 2007, at 1:34 PM, isplist@xxxxxxxxxxxx wrote: > > I'm thinking for application servers/cluster only, not workstation > > users. > > > > On Wed, 31 Jan 2007 11:10:55 -0800, Tom Mornini wrote: > >> We boot from flash drives, then pivot root to SAN storage. > >> > >> I agree with no drives in servers, but shared root is a > >> whole different ball game if you mean everyone using a > >> single filesystem for root. > >> > >> -- > >> -- Tom Mornini, CTO > >> -- Engine Yard, Ruby on Rails Hosting > >> -- Reliability, Ease of Use, Scalability > >> -- (866) 518-YARD (9273) > > -- > Jayson Vantuyl > Systems Architect > Engine Yard > jvantuyl@xxxxxxxxxxxxxx -- Gruss / Regards, ** Visit us at CeBIT 2007 in Hannover/Germany ** ** in Hall 5, Booth G48/2 (15.-21. of March) ** Marc Grimme Phone: +49-89 452 3538-14 http://www.atix.de/ http://www.open-sharedroot.org/ ** ATIX - Ges. fuer Informationstechnologie und Consulting mbH Einsteinstr. 10 - 85716 Unterschleissheim - Germany -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster