Keith, this sounds like what might happen if the SAN fabric is not properly configured. The Qlogic cards will send a scsi reset down the bus when activated. If your fabric is open to the point where nodes can see each other , then the other nodes will receive the scsi reset and there HBA's will go through a LIP reset. This is not good, especially when you have more than a few machine in a cluster. The result is that GFS cannot see the disks for a while and in most cases that would be for too long and nodes end up getting fenced. If this is indeed the case, I'd suggest making separate zones for each HBA<-->Storage device combination. For instance, a cluster with 3 nodes and two HBA's each would end up having 6 zones. Each zone conisting if an HBA and the storage it needs to access. They'll all be the same logically but they'll each be isolated from one another. Also, there is a problem with the RHEL3 based GFS in that it doesn't seem to play nice with the system with respect to lock space and memory. GFS will in fact hog all the memory it can (for performance reasons) to a point where the system itself cannot fork any processes. The way around this is (in U7) manage the inoded_purge parameter for each mounted GFS filesystem. inoded_purge is a percentage of locks held by GFS to try and purge thereby releasing that mamory back to the system. It appears even if you cannot fork, lock_gulmd can still respond to the other nodes indicating all is well when in fact it is not. The developers can surely correct me if I am wrong but that's the way it acts. Seems to me by making the response be a separate thread, this could be avoided since then lock_gulmd wold not be able to respond to the cluster heartbeat subssytem and it would get fenced. I'm sure there is more to it than that tho. If I set my inoded_purge numbers to 20 and fire up an rsync I stay right around 30,000 locks. My ststems (with 3GB ram) would get into a non-forking state at around 380k locks. Hope this helps. Corey -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Keith Lewis Sent: Thursday, May 18, 2006 9:27 PM To: Linux-cluster@xxxxxxxxxx Subject: GFS lock held for 12 hours We have a GFS cluster with 12 data nodes and 3 lock-servers. Red Hat AS3 U7 GFS-6.0.2.30-0 The data nodes all access a SAN disk. The SAN fabric is divided into two independent halves - called Red and Blue. Half the data nodes on each. The data nodes access only one disk - reachable via either SAN. There are other clients, other clusters and other disks sharing the SAN. Recently a faulty HBA was plugged into a machine, not part of our cluster, and connected to the Red SAN. At this point the Red SAN failed, there were two main moderately immediate results: One of the Red SAN nodes became very busy. Presumably it was holding a fairly big GFS lock at the time. But it continued to hold the lock and to send heartbeats. The node gave the appearance of being hung. The rest of the Red SAN nodes, over a period of a few minutes, all presumably did some IO to the disk and presumably got into a busy wait state, which was so tight that they stopped sending heartbeats, and got fenced. (APC PDU's) On reboot these nodes could see the SAN as normal except they could not see their SAN disk. Nor could they see another disk added to the SAN as part of the debugging attempted later. Many attempts were made to make the disk reappear, mostly by rebooting or shutting down GFS and rmmod-ing qla2300 and modprobe-ing qla2300. Everything was quite normal, except the Red SAN would not let any of our nodes see our disk. On the Blue SAN all the machines became very busy. Presumably because of the one Red SAN machine holding the lock. These nodes were also thought to be hung, but none of them were rebooted as it was discovered that they were still exporting an important Web tree that was not on GFS disk. (They sprang back to life when the one - lock holding - Red SAN machine was rebooted - which was well after the Red SAN problem was fixed). This state of affairs lasted 12 hours. Fixing it was made difficult because to anyone looking at the problem it appeared the entire SAN and the entire cluster was down. Very little that we saw at the time indicated that only the Red SAN had failed. (Hindsight is wonderful). This was particularly unfortunate. The justification for installing GFS was resilience in the face of hardware failure. (esp no spof). So finally here are my questions: Is it really reasonable for a machine to hang onto a lock for 12 hours ? Would it be possible for a GFS machine to detect that it cannot do IO to its GFS disk any more and release any locks it holds - perhaps by fencing itself? (I'm thinking of adding a cronjob that forks a subprocess that does an IO to the GFS disk. The parent could shutdown the node, leading to a fence, if the child takes more than a minute). Have I made any mistakes in my guesses and presumptions ? Keith Lewis -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster