One node goes offline, the other node can't see the replicated volume anymore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi Greg,
    Could you let us know what are the logs that are appearing in fw1's mount's logs when fw2 is taken down. It would be nice if you could get us all the logs(tarball may be?) on fw1 when fw2 is taken down.

Pranith.

----- Original Message -----
> From: "Greg Scott" <GregScott at infrasupport.com>
> To: "gluster-users at gluster.org" <gluster-users at gluster.org>
> Sent: Wednesday, July 10, 2013 10:56:30 AM
> Subject: Re: One node goes offline, the other node can't see the replicated volume anymore
> 
> 
> 
> Bummer. Looks like I?m on my own with this one.
> 
> 
> 
> - Greg
> 
> 
> 
> 
> From: gluster-users-bounces at gluster.org
> [mailto:gluster-users-bounces at gluster.org] On Behalf Of Greg Scott
> Sent: Tuesday, July 09, 2013 12:37 PM
> To: 'gluster-users at gluster.org'
> Subject: Re: One node goes offline, the other node can't see
> the replicated volume anymore
> 
> 
> 
> 
> No takers? I am running gluster 3.4beta3 that came with Fedora 19. Is my
> issue a consequence of some kind of quorum split-brain thing?
> 
> 
> 
> thanks
> 
> 
> 
> 
> - Greg Scott
> 
> 
> 
> 
> From: gluster-users-bounces at gluster.org [
> mailto:gluster-users-bounces at gluster.org ] On Behalf Of Greg Scott
> Sent: Monday, July 08, 2013 8:17 PM
> To: 'gluster-users at gluster.org'
> Subject: One node goes offline, the other node can't see the
> replicated volume anymore
> 
> 
> 
> 
> 
> I don?t get this. I have a replicated volume and 2 nodes. My challenge is,
> when I take one node offline, the other node can no longer access the volume
> until both nodes are back online again.
> 
> 
> 
> 
> 
> Details:
> 
> 
> 
> 
> 
> I have 2 nodes, fw1 and fw2. Each node has an XFS file system, /gluster-fw1
> on node fw1 and gluster-fw2 no node fw2. Node fw1 is at IP Address
> 192.168.253.1. Node fw2 is at 192.168.253.2.
> 
> 
> 
> 
> 
> I create a gluster volume named firewall-scripts which is a replica of those
> two XFS file systems. The volume holds a bunch of config files common to
> both fw1 and fw2. The application is an active/standby pair of firewalls and
> the idea is to keep config files in a gluster volume.
> 
> 
> 
> 
> 
> When both nodes are online, everything works as expected. But when I take
> either node offline, node fw2 behaves badly:
> 
> 
> 
> 
> 
> [root at chicago-fw2 ~]# ls /firewall-scripts
> 
> 
> ls: cannot access /firewall-scripts: Transport endpoint is not connected
> 
> 
> 
> 
> 
> And when I bring the offline node back online, node fw2 eventually behaves
> normally again.
> 
> 
> 
> 
> 
> What?s up with that? Gluster is supposed to be resilient and self-healing and
> able to stand up to this sort of abuse. So I must be doing something wrong.
> 
> 
> 
> 
> 
> Here is how I set up everything ? it doesn?t get much simpler than this and
> my setup is right out the Getting Started Guide but using my own names.
> 
> 
> 
> 
> 
> Here are the steps I followed, all from fw1:
> 
> 
> 
> 
> 
> gluster peer probe 192.168.253.2
> 
> 
> gluster peer status
> 
> 
> 
> 
> 
> Create and start the volume:
> 
> 
> 
> 
> 
> gluster volume create firewall-scripts replica 2 transport tcp
> 192.168.253.1:/gluster-fw1 192.168.253.2:/gluster-fw2
> 
> 
> gluster volume start firewall-scripts
> 
> 
> 
> 
> 
> On fw1:
> 
> 
> 
> 
> 
> mkdir /firewall-scripts
> 
> 
> mount -t glusterfs 192.168.253.1:/firewall-scripts /firewall-scripts
> 
> 
> 
> 
> 
> and add this line to /etc/fstab:
> 
> 
> 192.168.253.1:/firewall-scripts /firewall-scripts glusterfs defaults,_netdev
> 0 0
> 
> 
> 
> 
> 
> on fw2:
> 
> 
> 
> 
> 
> mkdir /firewall-scripts
> 
> 
> mount -t glusterfs 192.168.253.2:/firewall-scripts /firewall-scripts
> 
> 
> 
> 
> 
> and add this line to /etc/fstab:
> 
> 
> 192.168.253.2:/firewall-scripts /firewall-scripts glusterfs defaults,_netdev
> 0 0
> 
> 
> 
> 
> 
> That?s it. That?s the whole setup. When both nodes are online, everything
> replicates beautifully. But take one node offline and it all falls apart.
> 
> 
> 
> 
> 
> Here is the output from gluster volume info, identical on both nodes:
> 
> 
> 
> 
> 
> [root at chicago-fw1 etc]# gluster volume info
> 
> 
> 
> 
> 
> Volume Name: firewall-scripts
> 
> 
> Type: Replicate
> 
> 
> Volume ID: 239b6401-e873-449d-a2d3-1eb2f65a1d4c
> 
> 
> Status: Started
> 
> 
> Number of Bricks: 1 x 2 = 2
> 
> 
> Transport-type: tcp
> 
> 
> Bricks:
> 
> 
> Brick1: 192.168.253.1:/gluster-fw1
> 
> 
> Brick2: 192.168.253.2:/gluster-fw2
> 
> 
> [root at chicago-fw1 etc]#
> 
> 
> 
> 
> 
> Looking at /var/log/glusterfs/firewall-scripts.log on fw2, I see errors like
> this every couple of seconds:
> 
> 
> 
> 
> 
> [2013-07-09 00:59:04.706390] I [afr-common.c:3856:afr_local_init]
> 0-firewall-scripts-replicate-0: no subvolumes up
> 
> 
> [2013-07-09 00:59:04.706515] W [fuse-bridge.c:1132:fuse_err_cbk]
> 0-glusterfs-fuse: 3160: FLUSH() ERR => -1 (Transport endpoint is not
> connected)
> 
> 
> 
> 
> 
> And then when I bring fw1 back online, I see these messages on fw2:
> 
> 
> 
> 
> 
> [2013-07-09 01:01:35.006782] I [rpc-clnt.c:1648:rpc_clnt_reconfig]
> 0-firewall-scripts-client-0: changing port to 49152 (from 0)
> 
> 
> [2013-07-09 01:01:35.006932] W [socket.c:514:__socket_rwv]
> 0-firewall-scripts-client-0: readv failed (No data available)
> 
> 
> [2013-07-09 01:01:35.018546] I
> [client-handshake.c:1658:select_server_supported_programs]
> 0-firewall-scripts-client-0: Using Program GlusterFS 3.3, Num (1298437),
> Version (330)
> 
> 
> [2013-07-09 01:01:35.019273] I [client-handshake.c:1456:client_setvolume_cbk]
> 0-firewall-scripts-client-0: Connected to 192.168.253.1:49152, attached to
> remote volume '/gluster-fw1'.
> 
> 
> [2013-07-09 01:01:35.019356] I [client-handshake.c:1468:client_setvolume_cbk]
> 0-firewall-scripts-client-0: Server and Client lk-version numbers are not
> same, reopening the fds
> 
> 
> [2013-07-09 01:01:35.019441] I
> [client-handshake.c:1308:client_post_handshake] 0-firewall-scripts-client-0:
> 1 fds open - Delaying child_up until they are re-opened
> 
> 
> [2013-07-09 01:01:35.020070] I
> [client-handshake.c:930:client_child_up_reopen_done]
> 0-firewall-scripts-client-0: last fd open'd/lock-self-heal'd - notifying
> CHILD-UP
> 
> 
> [2013-07-09 01:01:35.020282] I [afr-common.c:3698:afr_notify]
> 0-firewall-scripts-replicate-0: Subvolume 'firewall-scripts-client-0' came
> back up; going online.
> 
> 
> [2013-07-09 01:01:35.020616] I
> [client-handshake.c:450:client_set_lk_version_cbk]
> 0-firewall-scripts-client-0: Server lk version = 1
> 
> 
> 
> 
> 
> So how do I make glusterfs survive a node failure, which is the whole point
> of all this?
> 
> 
> 
> 
> 
> thanks
> 
> 
> 
> 
> 
> ? Greg Scott
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux