Re: Alternative? Diskless Shared-Root GFS/Cluster

Jayson Vantuyl <jvantuyl@xxxxxxxxxxxxxx> · Thu, 1 Feb 2007 06:26:36 -0600

Ok, might as well ask this... since I can't seem to find anything on it. How 
about just a central storage that can be split up into many small segments so 
that blades can boot over the network, then joint the GFS cluster?
We use an IDE flash disk in each server.  It's just too easy to put a readonly bootstrap image on the flash and boot up off of that.  With affordable 256MB flash disks you can even have a powerful repair environment there in case things get broken.

I mean, all I want to do is to remove the drives since they really aren't 
being used. All of the work is being done on the GFS cluster once a machine is 
up and running. It barely does anything with it's drive other than the OS of 
course, even logging is all remote.
Don't remove the drives, use IDE flash drives instead.  I think you can also use USB thumb drives if your BIOS supported it.  A 256MB flash for $26.30 is hard to beat.  Order directly from the manufacturer at:

http://www.transcendusa.com/Products/ModDetail.asp?ModNo=26&LangNo=0

We put the boot loader, kernel, and a simple maintenance environment on the flash.  We still boot our root off of the SAN.  Interestingly, our SAN supports partitioning.  What we do here is have partitions for each node (automatically mounted using a LABEL= mount).  After that boots up, we run CLVM with our GFSes on top of it.  Quite handy (and CLVM isn't really necessary for your case).

Isn't there a simpler way of getting this done without having to get into 
whole new technologies? All of the blades have PXE boot capabilities, there 
must be some simple way of doing this?
I'd avoid this.  I've tried the PXE boot thing before and the PXE only becomes one more single point of failure / maintenance.  There's nothing like rebooting your cluster only to find that the PXE server has a failed disk.  :(

Basically with a SAN set up as follows:
/dev/san0p1 (FS for node 0, labeled node0)
/dev/san0p2 (FS for node 1, labeled node1)
...
/dev/san1 (CLVM / GFS / other stuff)

Your boot flash doesn't need much more than a very tiny Linux system (busybox is your friend), a file containing the node id (in this case /node_id) and a /linuxrc containing:

#!/bin/sh
NODEID=`cat /node_id`
# SET UP SAN HERE IF NECESSARY
mount /proc # Necessary because LABEL-mounting requires /proc/partitions
mount -o ro -L root-${NODEID} /newroot
cd /newroot
pivot_root . oldroot/
exec sbin/init

Considering that your flash hardly ever changes, and you can script creating of the flash image and node partitions, this quickly becomes very low maintenance.  If you want them to be identical, grab the MAC address off of the first NIC and generate the label with that...

A shared root is a nice idea.  However you just end up creating a fragile custom environment that is hostile to lots of software and creates new single points of failure and contention (making it neither high-performance nor high-availability).

 -- 
Jayson Vantuyl
Systems Architect
Engine Yard
jvantuyl@xxxxxxxxxxxxxx

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster