Maybe I'm missing something, but if you ifconfig down the interface, you're killing the endpoint that gluster is talking to locally. It wouldn't be talking over the loop back just because its local, its talking to the IP that is now gone. I'm assuming that by downing that interface, you're also cutting off access to the subnet required to talk to the other node. Therefore, the node doesn't see either gluster endpoint. The other node has to wait the default 42 second timeout since things didn't get shut down cleanly, but then can still talk to its local gluster instances and therefore resumes. I think a better way to simulate a communication failure would be to add some iptables rules to block one node from the other, or block both from seeing each other... At least then both nodes would still see the local cluster processes that are bound to the local ip address on the nics. They just wouldn't see each other. A better simulation would be to down the switch port that a node is connected to, rather than the host Nic itself. Todd From: Greg Scott Sent: 2/22/2014 5:44 PM To: 'gluster-users@xxxxxxxxxxx' Subject: One node goes offline,the other node loses its connection to its local Gluster volume We first went down this path back in July 2013 and now I’m back again for more. It’s a similar situation but now with new versions of everything. I’m using glusterfs 3.4.2 with Fedora 20. I have 2 nodes named fw1 and fw2. When I ifdown the NIC I’m using for Gluster on either node, that node cannot see its Gluster volume, but the other node can see it after a timeout. As soon as I ifup that NIC, everyone can see everything again. Is this expected behavior? When that interconnect drops, I want both nodes to see their own local copy and then sync everything back up when the interconnect connects again. Here are details. Node fw1 has an XFS filesystem named gluster-fw1. Node fw2 has an XFS filesystem named gluster-fw2. Those are both gluster bricks and both nodes mount the bricks as /firewall-scripts. So anything one node does in /firewall-scripts should also be on the other node within a few milliseconds. The test is to isolate the nodes from each other and see if they can still access their own local copy of /firewall-scripts. The easiest way to do this is to ifdown the interconnect NIC. But this doesn’t work. Here is what happens when I ifdown the NIC on node fw1. Node fw2 can see /firewall-scripts but fw1 shows an error. When I ifdown on fw2, the behavior is identical, but swapping fw1 and fw2. On fw1, after an ifdown I lose connection with my Gluster filesystem. [root@stylmark-fw1 firewall-scripts]# ifdown enp5s4 [root@stylmark-fw1 firewall-scripts]# ls /firewall-scripts ls: cannot access /firewall-scripts: Transport endpoint is not connected [root@stylmark-fw1 firewall-scripts]# df -h df: â/firewall-scriptsâ: Transport endpoint is not connected Filesystem Size Used Avail Use% Mounted on /dev/mapper/fedora-root 17G 2.2G 14G 14% / devtmpfs 989M 0 989M 0% /dev tmpfs 996M 0 996M 0% /dev/shm tmpfs 996M 564K 996M 1% /run tmpfs 996M 0 996M 0% /sys/fs/cgroup tmpfs 996M 0 996M 0% /tmp /dev/sda2 477M 87M 362M 20% /boot /dev/sda1 200M 9.6M 191M 5% /boot/efi /dev/mapper/fedora-gluster--fw1 9.8G 33M 9.8G 1% /gluster-fw1 10.10.10.2:/fwmaster 214G 75G 128G 37% /mnt/fwmaster [root@stylmark-fw1 firewall-scripts]# But on fw2, I can still look at it: [root@stylmark-fw2 ~]# ls /firewall-scripts allow-all failover-monitor.sh rcfirewall.conf allow-all-with-nat initial_rc.firewall start-failover-monitor.sh etc rc.firewall var [root@stylmark-fw2 ~]# [root@stylmark-fw2 ~]# [root@stylmark-fw2 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/fedora-root 17G 2.3G 14G 14% / devtmpfs 989M 0 989M 0% /dev tmpfs 996M 0 996M 0% /dev/shm tmpfs 996M 560K 996M 1% /run tmpfs 996M 0 996M 0% /sys/fs/cgroup tmpfs 996M 0 996M 0% /tmp /dev/sda2 477M 87M 362M 20% /boot /dev/sda1 200M 9.6M 191M 5% /boot/efi /dev/mapper/fedora-gluster--fw2 9.8G 33M 9.8G 1% /gluster-fw2 192.168.253.2:/firewall-scripts 9.8G 33M 9.8G 1% /firewall-scripts 10.10.10.2:/fwmaster 214G 75G 128G 37% /mnt/fwmaster [root@stylmark-fw2 ~]# And back to fw1 – after an ifup, I can see it again: [root@stylmark-fw1 firewall-scripts]# ifup enp5s4 [root@stylmark-fw1 firewall-scripts]# [root@stylmark-fw1 firewall-scripts]# ls /firewall-scripts allow-all failover-monitor.sh rcfirewall.conf allow-all-with-nat initial_rc.firewall start-failover-monitor.sh etc rc.firewall var [root@stylmark-fw1 firewall-scripts]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/fedora-root 17G 2.2G 14G 14% / devtmpfs 989M 0 989M 0% /dev tmpfs 996M 0 996M 0% /dev/shm tmpfs 996M 564K 996M 1% /run tmpfs 996M 0 996M 0% /sys/fs/cgroup tmpfs 996M 0 996M 0% /tmp /dev/sda2 477M 87M 362M 20% /boot /dev/sda1 200M 9.6M 191M 5% /boot/efi /dev/mapper/fedora-gluster--fw1 9.8G 33M 9.8G 1% /gluster-fw1 192.168.253.1:/firewall-scripts 9.8G 33M 9.8G 1% /firewall-scripts 10.10.10.2:/fwmaster 214G 75G 128G 37% /mnt/fwmaster [root@stylmark-fw1 firewall-scripts]# What can I do about this? Thanks - Greg Scott |
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users