On Wed, Jul 09, 2008 at 09:44:24AM +0100, Steven Whitehouse wrote: > Hi, > > On Tue, 2008-07-08 at 18:15 -0400, J. Bruce Fields wrote: > > On Mon, Jul 07, 2008 at 02:49:28PM -0400, bfields wrote: > > > On Mon, Jul 07, 2008 at 10:48:28AM -0500, David Teigland wrote: > > > > On Sun, Jul 06, 2008 at 05:51:05PM -0400, J. Bruce Fields wrote: > > > > > - write(control_fd, in, sizeof(struct gdlm_plock_info)); > > > > > + write(control_fd, in, sizeof(struct dlm_plock_info)); > > > > > > > > Gah, sorry, I keep fixing that and it keeps reappearing. > > > > > > > > > > > > > Jul 1 14:06:42 piglet2 kernel: dlm: connect from non cluster node > > > > > > > > > It looks like dlm_new_workspace() is waiting on dlm_recoverd, which is > > > > > in "D" state in dlm_rcom_status(), so I guess the second node isn't > > > > > getting some dlm reply it expects? > > > > > > > > dlm inter-node communication is not working here for some reason. There > > > > must be something unusual with the way the network is configured on the > > > > nodes, and/or a problem with the way the cluster code is applying the > > > > network config to the dlm. > > > > > > > > Ah, I just remembered what this sounds like; we see this kind of thing > > > > when a network interface has multiple IP addresses, and/or routing is > > > > configured strangely. Others cc'ed could offer better details on exactly > > > > what to look for. > > > > > > OK, thanks! I'm trying to run gfs2 on 4 kvm machines, I'm an expert on > > > neither, and it's entirely likely there's some obvious misconfiguration. > > > On the kvm host there are 4 virtual interfaces bridged together: > > > > I ran wireshark on vnet0 while doing the second mount; what I saw was > > the second machine opened a tcp connection to port 21064 on the first > > (which had already completed the mount), and sent it a single message > > identified by wireshark as "DLM3" protocol, type recovery command: > > status command. It got back an ACK then a RST. > > > > Then the same happened in the other direction, with the first machine > > sending a similar message to port 21064 on the second, which then reset > > the connection. > > > > --b. > > > An ACK & RST for the same packet? Or was than an ACK SYN for the SYN and > then an RST for the following data packet? Could you post the trace or > put it somewhere we can see it? Sure, thanks. It's at http://www.fieldses.org/~bfields/failed-dlm.pcap http://www.fieldses.org/~bfields/failed-dlm-filtered.pcap (The second is just the dlm traffic, with all the ais, ssh, dns, etc. filtered out.) --b. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster