One node goes offline, the other node loses its connection to its local Gluster volume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We first went down this path back in July 2013 and now I’m back again for more.  It’s a similar situation but now with new versions of everything.   I’m using glusterfs 3.4.2 with Fedora 20. 

 

I have 2 nodes named fw1 and fw2.  When I ifdown the NIC I’m using for Gluster on either node, that node cannot see  its Gluster volume, but the other node can see it after a timeout.  As soon as I ifup that NIC, everyone can see everything again. 

 

Is this expected behavior?  When that interconnect drops, I want both nodes to see their own local copy and then sync everything back up when the interconnect connects again. 

 

Here are details.  Node fw1 has an XFS filesystem named gluster-fw1.  Node fw2 has an XFS filesystem named gluster-fw2.   Those are both gluster bricks and both nodes mount the bricks as /firewall-scripts.  So anything one node does in /firewall-scripts should also be on the other node within a few milliseconds.   The test is to isolate the nodes from each other and see if they can still access their own local copy of /firewall-scripts.  The easiest way to do this is to ifdown the interconnect NIC.  But this doesn’t work. 

 

Here is what happens when I ifdown the NIC on node fw1.  Node fw2 can see /firewall-scripts but fw1 shows an error.  When I ifdown on fw2, the behavior is identical, but swapping fw1 and fw2.

 

On fw1, after an ifdown  I lose connection with my Gluster filesystem.

 

[root@stylmark-fw1 firewall-scripts]# ifdown enp5s4

[root@stylmark-fw1 firewall-scripts]# ls /firewall-scripts

ls: cannot access /firewall-scripts: Transport endpoint is not connected

[root@stylmark-fw1 firewall-scripts]# df -h

df: â/firewall-scriptsâ: Transport endpoint is not connected

Filesystem                       Size  Used Avail Use% Mounted on

/dev/mapper/fedora-root           17G  2.2G   14G  14% /

devtmpfs                         989M     0  989M   0% /dev

tmpfs                            996M     0  996M   0% /dev/shm

tmpfs                            996M  564K  996M   1% /run

tmpfs                            996M     0  996M   0% /sys/fs/cgroup

tmpfs                            996M     0  996M   0% /tmp

/dev/sda2                        477M   87M  362M  20% /boot

/dev/sda1                        200M  9.6M  191M   5% /boot/efi

/dev/mapper/fedora-gluster--fw1  9.8G   33M  9.8G   1% /gluster-fw1

10.10.10.2:/fwmaster             214G   75G  128G  37% /mnt/fwmaster

[root@stylmark-fw1 firewall-scripts]#

 

But on fw2, I can still look at it:

 

[root@stylmark-fw2 ~]# ls /firewall-scripts

allow-all           failover-monitor.sh  rcfirewall.conf

allow-all-with-nat  initial_rc.firewall  start-failover-monitor.sh

etc                 rc.firewall          var

[root@stylmark-fw2 ~]#

[root@stylmark-fw2 ~]#

[root@stylmark-fw2 ~]# df -h

Filesystem                       Size  Used Avail Use% Mounted on

/dev/mapper/fedora-root           17G  2.3G   14G  14% /

devtmpfs                         989M     0  989M   0% /dev

tmpfs                            996M     0  996M   0% /dev/shm

tmpfs                            996M  560K  996M   1% /run

tmpfs                            996M     0  996M   0% /sys/fs/cgroup

tmpfs                            996M     0  996M   0% /tmp

/dev/sda2                        477M   87M  362M  20% /boot

/dev/sda1                        200M  9.6M  191M   5% /boot/efi

/dev/mapper/fedora-gluster--fw2  9.8G   33M  9.8G   1% /gluster-fw2

192.168.253.2:/firewall-scripts  9.8G   33M  9.8G   1% /firewall-scripts

10.10.10.2:/fwmaster             214G   75G  128G  37% /mnt/fwmaster

[root@stylmark-fw2 ~]#

 

And back to fw1 – after an ifup, I can see it again:

 

[root@stylmark-fw1 firewall-scripts]# ifup enp5s4

[root@stylmark-fw1 firewall-scripts]#

[root@stylmark-fw1 firewall-scripts]# ls /firewall-scripts

allow-all           failover-monitor.sh  rcfirewall.conf

allow-all-with-nat  initial_rc.firewall  start-failover-monitor.sh

etc                 rc.firewall          var

[root@stylmark-fw1 firewall-scripts]# df -h

Filesystem                       Size  Used Avail Use% Mounted on

/dev/mapper/fedora-root           17G  2.2G   14G  14% /

devtmpfs                         989M     0  989M   0% /dev

tmpfs                            996M     0  996M   0% /dev/shm

tmpfs                            996M  564K  996M   1% /run

tmpfs                            996M     0  996M   0% /sys/fs/cgroup

tmpfs                            996M     0  996M   0% /tmp

/dev/sda2                        477M   87M  362M  20% /boot

/dev/sda1                        200M  9.6M  191M   5% /boot/efi

/dev/mapper/fedora-gluster--fw1  9.8G   33M  9.8G   1% /gluster-fw1

192.168.253.1:/firewall-scripts  9.8G   33M  9.8G   1% /firewall-scripts

10.10.10.2:/fwmaster             214G   75G  128G  37% /mnt/fwmaster

[root@stylmark-fw1 firewall-scripts]#

 

What can I do about this?

 

Thanks

 

-          Greg Scott

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux