I am in the process of implementing clustering for shared data storage across a number of nodes, with several nodes exporting large GNBD volumes, and also new storage from an iSCSI raid chassis with 6TB of storage. The nature of the application requires that the nodes that access the data store are pretty much independent of each other, just providing CPU and graphics support while reading several hundred megabytes of image data in 32mb chunks, and writing numerous small summary files of this data. Our current methodology, which works but is slow, is to server the data by NFS over gigabit ethernet. A similar facility nearby, with the same application, has implemented GFS on FC equipment, and are using the FC switch for fencing. As I have somewhat different storage hardware and data retention requirements, I need to implement different fencing methods. The storage network is on a 3com switch, which is able to take down a given link via a telnet command, and later restore it. Also, each of the storage nodes has a Smart UPS with control over the individual outlets on the UPS, which could be used for power fencing of the GNBD server nodes. The only issue there is that these are not networked UPS systems, but are connected via serial ports to some of the nodes. On the network switch fencing, I am currently using the storage net for cluster communications, so bringing down a port also stops cluster communications. Each of the Storage systems has at least two network interfaces (most have 6 or more), one (or more) on the storage net, and one on our intranet. The data processing units have two net interfaces, one on each network. I know I will probably have to write a fence agent for at least some of the parts of this. The questions that I have are the exact sequence of events for fencing a node, as in who initiates the fencing operation, and what is the sequence of events for recovery and rejoining the cluster after a reboot. I currently have a test setup of four nodes with a 4TB GNBD export from one of the nodes to the other three, using fence_gnbd on those nodes, and fence_gnbd_notify with fence_manual on the server, at least until I can get the UPS fence agent working. If I need to, I can put the UPS systems on a network terminal server to allow any node to connect to the UPS for commands, but would prefer that it connect to one of the cluster nodes directly using the serial port. For the iSCSI chassis, from the manual it appears that I can force a iSCSI disconnect via snmp or telnet using the management interface for the chassis, which from my reading of the RFQ, should be an effective fence for iSCSI, as it will invalidate the current connection from the initiator, and requires a re-authentication and negotiation of the link before allowing more communications with that node. Hopefully, this gives enough information to a least get a start on this, as it is several issues, each which may need separate followup. Sincerely James Fait -- James Fait, Ph.D. Beamline Scientist, SER-CAT APS, building 436B-008 Argonne National Laboratory 9700 S Cass Ave Argonne, IL 60439 phone 630-252-0644 fax 630-252-0652 email fait@xxxxxxx -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster