Scooter Morris wrote: > We are in the process of building a cluster, which will hope to put into > production when RHEL 5.3 is released. Our plan is to use GFS2, which > we've been experimenting with for some time, but we're having some > problems. The cluster has 3 nodes, two HP DL580's and one HP DL585 -- > we're using ILO for fencing. We want to share a couple of filesystems > using GFS2 which are connected to our SAN (an EVA 5000). I've set > everything up and it all works as expected, although on occasion, GFS2 > just seems to hang. This happens 1-4 times/week. What I note in the > logs are a series of dlm messages. On node 1 (for example) I see: > > dlm: connecting to 3 > dlm: connecting to 2 > dlm: connecting to 2 > dlm: connecting to 2 > dlm: connecting to 3 > dlm: connecting to 2 > dlm: connecting to 2 > dlm: connecting to 2 > dlm: connecting to 3 > dlm: connecting to 3 > dlm: connecting to 3 > dlm: connecting to 3 > > On node 2, I see: > > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > > and on node 3, I see: > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > Those messages are usually caused by routing problems. The DLM binds to the address it is given by cman (see the output of cman_tool status for that) and receiving nodes check incoming packets against that address to make sure that only valid cluster nodes try to make connections. What is happening here (I think - it sounds like a problem I've seen before) is that the packets are being routed though another interface than the one cman is using and the remote node sees them as coming from a different address. This can happen if you have two ethernet interfaces connected to the same physical segment for example. There was a also a bug that could cause this if the routing was not quite so broken but a little odd, though I don't have the bugzilla number to hand, sorry. -- Chrissie -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster