So, I at least now know for sure that this is a locking issue. > That's not showing available, that's showing it already in use. I also used telnet to connect to a few of the machines and was able to get in. I don't know enough about what I'm seeing on the netstat command but it almost looks like redirection to 6809? It's confusing as heck to me since I didn't change anything, just the storage then fired it all back up again. > 6809 on the other end is very suspicious, like there is maybe some > confusion about cman & dlm ports going on. Check cluster.conf and your > startup scripts for port changing things. It's very unusual. Also check > that all the nodes are using the same configuration. I checked the cluster.conf file, don't see anything obvious so changed the version number and ran an update to all nodes just to be safe. I'm rebooting the nodes now, one at a time. On my workstation console log, I see; Jan 29 21:22:40 compdev kernel: GFS: fsid=compweb:web.0: jid=0: Trying to acquire journal lock... Jan 29 21:22:40 compdev kernel: GFS: fsid=compweb:web.0: jid=0: Looking at journal... Jan 29 21:22:40 compdev kernel: GFS: fsid=compweb:web.0: jid=0: Done Jan 30 08:50:36 compdev kernel: GFS: fsid=compweb:web.0: jid=3: Trying to acquire journal lock... Jan 30 08:50:36 compdev kernel: GFS: fsid=compweb:web.0: jid=3: Busy Jan 30 08:57:38 compdev kernel: GFS: fsid=compweb:web.0: jid=1: Trying to acquire journal lock... Jan 30 08:57:38 compdev kernel: GFS: fsid=compweb:web.0: jid=1: Busy Jan 30 09:05:29 compdev kernel: GFS: fsid=compweb:web.0: jid=2: Trying to acquire journal lock... Jan 30 09:05:29 compdev kernel: GFS: fsid=compweb:web.0: jid=2: Busy Jan 30 10:06:10 compdev kernel: CMAN: node cweb92 has been removed from the cluster : Missed too many heartbeats Jan 30 10:08:14 compdev kernel: CMAN: node cweb92 rejoining Jan 30 10:08:17 compdev kernel: dlm: could not bind to local address for connect: -98 Jan 30 10:10:26 compdev kernel: CMAN: node img63 has been removed from the cluster : Missed too many heartbeats Jan 30 10:12:53 compdev kernel: CMAN: node img63 rejoining Jan 30 10:12:57 compdev kernel: dlm: could not bind to local address for connect: -98 Jan 30 10:17:43 compdev kernel: GFS: fsid=compweb:web.0: jid=1: Trying to acquire journal lock... Jan 30 10:17:43 compdev kernel: GFS: fsid=compweb:web.0: jid=1: Looking at journal... Jan 30 10:19:11 compdev kernel: GFS: fsid=compweb:web.0: jid=1: Acquiring the transaction lock... Jan 30 10:19:11 compdev kernel: GFS: fsid=compweb:web.0: jid=1: Replaying journal... Jan 30 10:19:11 compdev kernel: GFS: fsid=compweb:web.0: jid=1: Replayed 0 of 22 blocks Jan 30 10:19:11 compdev kernel: GFS: fsid=compweb:web.0: jid=1: replays = 0, skips = 0, sames = 22 Jan 30 10:19:11 compdev kernel: GFS: fsid=compweb:web.0: jid=1: Journal replayed in 1s Jan 30 10:19:11 compdev kernel: GFS: fsid=compweb:web.0: jid=1: Done I looked at some of the other nodes and they all show similar things. This seems to show that port 21064 is available on all nodes. #ssh 192.168.1.40 netstat -anp | grep 21064 tcp 0 0 192.168.1.40:21064 0.0.0.0:* LISTEN - tcp 0 0 192.168.1.40:6809 192.168.1.58:21064 ESTABLISHED - tcp 0 0 192.168.1.40:21064 192.168.1.92:33123 ESTABLISHED - tcp 0 0 192.168.1.40:21064 192.168.1.62:32779 ESTABLISHED - tcp 0 0 192.168.1.40:21064 192.168.1.63:6809 ESTABLISHED - #ssh 192.168.1.62 netstat -anp | grep 21064 tcp 0 0 192.168.1.62:21064 0.0.0.0:* LISTEN - tcp 0 0 192.168.1.62:21064 192.168.1.63:32774 ESTABLISHED - tcp 0 0 192.168.1.62:6809 192.168.1.58:21064 ESTABLISHED - tcp 0 0 192.168.1.62:32773 192.168.1.92:21064 ESTABLISHED - tcp 0 0 192.168.1.62:32780 192.168.1.63:21064 ESTABLISHED - tcp 0 0 192.168.1.62:21064 192.168.1.58:6809 ESTABLISHED - tcp 0 0 192.168.1.62:32779 192.168.1.40:21064 ESTABLISHED - #ssh 192.168.1.63 netstat -anp | grep 21064 tcp 0 0 192.168.1.63:21064 0.0.0.0:* LISTEN - tcp 0 0 192.168.1.63:6809 192.168.1.40:21064 ESTABLISHED - tcp 0 0 192.168.1.63:21064 192.168.1.62:32780 ESTABLISHED - tcp 0 0 192.168.1.63:21064 192.168.1.92:33157 ESTABLISHED - tcp 0 0 192.168.1.63:32774 192.168.1.62:21064 ESTABLISHED - tcp 0 0 192.168.1.63:32780 192.168.1.58:21064 ESTABLISHED - #ssh 192.168.1.92 netstat -anp | grep 21064 tcp 0 0 192.168.1.92:21064 0.0.0.0:* LISTEN - tcp 0 0 192.168.1.92:6809 192.168.1.58:21064 ESTABLISHED - tcp 0 0 192.168.1.92:21064 192.168.1.62:32773 ESTABLISHED - tcp 0 0 192.168.1.92:33157 192.168.1.63:21064 ESTABLISHED - tcp 0 0 192.168.1.92:33123 192.168.1.40:21064 ESTABLISHED - -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster