Hi gurus. We have a three nodes Itanium 64 with GFS in conjuction with OCFS for a Oracle RAC We have find many phisical problems in our switch , and made a sobstitution of the switches. Here is the problem : the first node of the cluster doesn't re-login to the gfs: here is the situation:made from the master : [root@sapcl02 spool]# gulm_tool nodelist sapcl02:core Name: sapcl03.aem.torino.it ip = 100.2.254.210 state = Logged in mode = Slave missed beats = 0 last beat = 1151328027843676 delay avg = 10000443 max delay = 13047588 Name: sapcl01.aem.torino.it ip = 100.2.254.208 state = Expired mode = Slave missed beats = 0 last beat = 0 delay avg = 0 max delay = 0 Name: sapcl02.aem.torino.it ip = 100.2.254.209 state = Logged in mode = Master missed beats = 0 last beat = 1151328021593557 delay avg = 10000849 max delay = 113821588141 as you can see ...sapcl01 is in state expired. In sapcl01 the startint of lock_gulmd hung .... >From the /var/log/message of the master i see ....infinitely repetuted .... Jun 26 15:23:32 sapcl02 lock_gulmd_core[22601]: Gonna exec fence_node sapcl01.aem.torino.it Jun 26 15:23:32 sapcl02 fence_node[22601]: Cannot locate the cluster node, sapcl01.aem.torino.it Jun 26 15:23:32 sapcl02 fence_node[22601]: All fencing methods FAILED! Jun 26 15:23:32 sapcl02 fence_node[22601]: Fence of "sapcl01.aem.torino.it" was unsuccessful. Jun 26 15:23:32 sapcl02 lock_gulmd_core[7499]: Fence failed. [22601] Exit code:1 Running it again. Jun 26 15:23:32 sapcl02 lock_gulmd_core[7499]: Forked [22604] fence_node sapcl01.aem.torino.it with a 5 pause. also if i power down the sapcl01 node , the master try and try to fence the slave node. Also , in the master and the slave , i try to manually fence for eliminate the expiration . But no results. It seems that the only way to reallinate the cluster is to GLOBALLY power down the entire nodes , and restart. here is the configuration files: ########### fence.ccs ######################################## fence_devices { nps { agent = "fence_wti" ipaddr = "100.2.254.254" login = "nps" passwd = "password" } } [root@sapcl01 gfs]# more nodes.ccs #### nodes.ccs ####################################### nodes { sapcl01 { ip_interfaces { eth1 = "192.168.2.208" } fence { power { nps { port = 1 } } } } sapcl02 { ip_interfaces { eth1 = "192.168.2.209" } fence { power { nps { port = 2 } } } } sapcl03 { ip_interfaces { eth1 = "192.168.2.210" } fence { power { nps { port = 3 } } [root@sapcl01 gfs]# more cluster.ccs #### cluster.ccs ##################################### cluster { name = "gfsrac" lock_gulm { servers = [ "sapcl01","sapcl02","sapcl03" ] } } PS the cluster is was fully operational from 7 months ago. the change of the switch is the problem Best regards Stefano -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster