On Mon, Jul 07, 2008 at 02:49:28PM -0400, bfields wrote: > On Mon, Jul 07, 2008 at 10:48:28AM -0500, David Teigland wrote: > > On Sun, Jul 06, 2008 at 05:51:05PM -0400, J. Bruce Fields wrote: > > > - write(control_fd, in, sizeof(struct gdlm_plock_info)); > > > + write(control_fd, in, sizeof(struct dlm_plock_info)); > > > > Gah, sorry, I keep fixing that and it keeps reappearing. > > > > > > > Jul 1 14:06:42 piglet2 kernel: dlm: connect from non cluster node > > > > > It looks like dlm_new_workspace() is waiting on dlm_recoverd, which is > > > in "D" state in dlm_rcom_status(), so I guess the second node isn't > > > getting some dlm reply it expects? > > > > dlm inter-node communication is not working here for some reason. There > > must be something unusual with the way the network is configured on the > > nodes, and/or a problem with the way the cluster code is applying the > > network config to the dlm. > > > > Ah, I just remembered what this sounds like; we see this kind of thing > > when a network interface has multiple IP addresses, and/or routing is > > configured strangely. Others cc'ed could offer better details on exactly > > what to look for. > > OK, thanks! I'm trying to run gfs2 on 4 kvm machines, I'm an expert on > neither, and it's entirely likely there's some obvious misconfiguration. > On the kvm host there are 4 virtual interfaces bridged together: I ran wireshark on vnet0 while doing the second mount; what I saw was the second machine opened a tcp connection to port 21064 on the first (which had already completed the mount), and sent it a single message identified by wireshark as "DLM3" protocol, type recovery command: status command. It got back an ACK then a RST. Then the same happened in the other direction, with the first machine sending a similar message to port 21064 on the second, which then reset the connection. --b. > > bfields@pig:~$ brctl show > bridge name bridge id STP enabled interfaces > vnet0 8000.00ff0823c0f3 yes vnet1 > vnet2 > vnet3 > vnet4 > > vnet0 has address 192.168.122.1 on the host, and the 4 kvm guests are > statically assigned addresses 129, 130, 131, and 132 on the 192.168.122.* > network, so a kvm guest looks like: > > piglet1:~# ifconfig > eth1 Link encap:Ethernet HWaddr 00:16:3e:16:4d:61 > inet addr:192.168.122.129 Bcast:192.168.122.255 Mask:255.255.255.0 > inet6 addr: fe80::216:3eff:fe16:4d61/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:2464 errors:0 dropped:0 overruns:0 frame:0 > TX packets:1806 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:197099 (192.4 KiB) TX bytes:165606 (161.7 KiB) > Interrupt:11 Base address:0xc100 > > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > inet6 addr: ::1/128 Scope:Host > UP LOOPBACK RUNNING MTU:16436 Metric:1 > RX packets:285 errors:0 dropped:0 overruns:0 frame:0 > TX packets:285 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:13394 (13.0 KiB) TX bytes:13394 (13.0 KiB) > > piglet1:~# cat /etc/hosts > 127.0.0.1 localhost > 192.168.122.129 piglet1 > 192.168.122.130 piglet2 > 192.168.122.131 piglet3 > 192.168.122.132 piglet4 > > # The following lines are desirable for IPv6 capable hosts > ::1 ip6-localhost ip6-loopback > fe00::0 ip6-localnet > ff00::0 ip6-mcastprefix > ff02::1 ip6-allnodes > ff02::2 ip6-allrouters > ff02::3 ip6-allhosts > > The network setup looks otherwise fine--they can all ping each other and > the outside world. > > --b. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster