On Thu, Jul 10, 2008 at 10:26:54AM +0100, Christine Caulfield wrote: > J. Bruce Fields wrote: >> On Wed, Jul 09, 2008 at 04:50:14PM +0100, Christine Caulfield wrote: >>> J. Bruce Fields wrote: >>>> On Wed, Jul 09, 2008 at 09:51:02AM +0100, Christine Caulfield wrote: >>>>> Steven Whitehouse wrote: >>>>>> Hi, >>>>>> >>>>>> On Tue, 2008-07-08 at 18:15 -0400, J. Bruce Fields wrote: >>>>>>> On Mon, Jul 07, 2008 at 02:49:28PM -0400, bfields wrote: >>>>>>>> On Mon, Jul 07, 2008 at 10:48:28AM -0500, David Teigland wrote: >>>>>>>>> On Sun, Jul 06, 2008 at 05:51:05PM -0400, J. Bruce Fields wrote: >>>>>>>>>> - write(control_fd, in, sizeof(struct gdlm_plock_info)); >>>>>>>>>> + write(control_fd, in, sizeof(struct dlm_plock_info)); >>>>>>>>> Gah, sorry, I keep fixing that and it keeps reappearing. >>>>>>>>> >>>>>>>>> >>>>>>>>>> Jul 1 14:06:42 piglet2 kernel: dlm: connect from non cluster node >>>>>>>>>> It looks like dlm_new_workspace() is waiting on dlm_recoverd, which is >>>>>>>>>> in "D" state in dlm_rcom_status(), so I guess the second node isn't >>>>>>>>>> getting some dlm reply it expects? >>>>>>>>> dlm inter-node communication is not working here for some reason. There >>>>>>>>> must be something unusual with the way the network is configured on the >>>>>>>>> nodes, and/or a problem with the way the cluster code is applying the >>>>>>>>> network config to the dlm. >>>>>>>>> >>>>>>>>> Ah, I just remembered what this sounds like; we see this kind of thing >>>>>>>>> when a network interface has multiple IP addresses, and/or routing is >>>>>>>>> configured strangely. Others cc'ed could offer better details on exactly >>>>>>>>> what to look for. >>>>>>>> OK, thanks! I'm trying to run gfs2 on 4 kvm machines, I'm an expert on >>>>>>>> neither, and it's entirely likely there's some obvious misconfiguration. >>>>>>>> On the kvm host there are 4 virtual interfaces bridged together: >>>>>>> I ran wireshark on vnet0 while doing the second mount; what I saw was >>>>>>> the second machine opened a tcp connection to port 21064 on the first >>>>>>> (which had already completed the mount), and sent it a single message >>>>>>> identified by wireshark as "DLM3" protocol, type recovery command: >>>>>>> status command. It got back an ACK then a RST. >>>>>>> >>>>>>> Then the same happened in the other direction, with the first machine >>>>>>> sending a similar message to port 21064 on the second, which then reset >>>>>>> the connection. >>>>>>> >>>>> That's a symptom of the "connect from non-cluster node" error in >>>>> the DLM. >>>> I think I am getting a message to that affect in my logs. >>>> >>>>> It's got a connection from an IP address that is not known to >>>>> cman. So it closes it as a spoofer >>>> OK. Is there an easy way to see the list of ip addresses known to cman? >>> yes, >>> >>> cman_tool nodes -a >>> >>> will show you all the nodes and their known IP addresses >> >> piglet2:~# cman_tool nodes -a >> Node Sts Inc Joined Name >> 1 M 376 2008-07-09 12:30:32 piglet1 >> Addresses: 192.168.122.129 2 M 368 2008-07-09 12:30:31 >> piglet2 >> Addresses: 192.168.122.130 3 M 380 2008-07-09 12:30:33 >> piglet3 >> Addresses: 192.168.122.131 4 M 372 2008-07-09 12:30:31 >> piglet4 >> Addresses: 192.168.122.132 >> >> These addresses are correct (and are the same addresses that show up in the >> packet trace). >> >> I must be overlooking something very obvious.... > > Hmm, very odd. > > Are those IP addresses consistent across all nodes in the cluster ? Yes, "cman_tool nodes -a" gives the same IP addresses no matter which of the four cluster nodes it's run on. --b. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster