Re: GlusterFS cluster of 2 nodes is disconnected after nodes reboot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> What linux distro ?
>
> Anything special about your network configuration ?
>
> Any chance your server is taking too long to release networking and gluster
> is starting before network is ready ?
>
> Can you completely disable iptables and test again ?

Both nodes are CentOS 6.5 VMs running on VMware ESXi 5.5.0. There is nothing special about network configuration, just static IPs. Ping and ssh works fine. I added "iptables -F" to /etc/rc.local. After simulteneous reboot "gluster peer status" on both nodes is connected and replication works fine. But "gluster volume status" states that NFS server and self-heal daemon on one of them isn't running. So I need to restart glusterd to make them running.

Another issue: when everything is OK after "service glusterd restart" on both nodes, I reboot one node and then can see on the rebooted node (ipset02):

[root@ipset02 etc]# gluster peer status
Number of Peers: 1

Hostname: ipset01
Uuid: 6313a4dd-f736-46ff-9836-bdf05c886ffd
State: Peer in Cluster (Connected)
[root@ipset02 etc]# gluster volume status
Status of volume: ipset-gv
Gluster process                        Port    Online    Pid
------------------------------------------------------------------------------
Brick ipset01:/usr/local/etc/ipset        49152    Y    1615
Brick ipset02:/usr/local/etc/ipset        49152    Y    2282
NFS Server on localhost                    2049    Y    2289
Self-heal Daemon on localhost                N/A    Y    2296
NFS Server on ipset01            2049    Y    2258
Self-heal Daemon on ipset01       N/A    Y    2262
 
There are no active volume tasks

[root@ipset02 etc]# tail -17 /var/log/glusterfs/glustershd.log
[2014-03-26 07:55:48.982456] E [client-handshake.c:1742:client_query_portmap_cbk] 0-ipset-gv-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2014-03-26 07:55:48.982532] W [socket.c:514:__socket_rwv] 0-ipset-gv-client-1: readv failed (No data available)
[2014-03-26 07:55:48.982555] I [client.c:2097:client_rpc_notify] 0-ipset-gv-client-1: disconnected
[2014-03-26 07:55:48.982572] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-ipset-gv-client-0: changing port to 49152 (from 0)
[2014-03-26 07:55:48.982627] W [socket.c:514:__socket_rwv] 0-ipset-gv-client-0: readv failed (No data available)
[2014-03-26 07:55:48.986252] I [client-handshake.c:1659:select_server_supported_programs] 0-ipset-gv-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-03-26 07:55:48.986551] I [client-handshake.c:1456:client_setvolume_cbk] 0-ipset-gv-client-0: Connected to 192.168.1.180:49152, attached to remote volume '/usr/local/etc/ipset'.
[2014-03-26 07:55:48.986566] I [client-handshake.c:1468:client_setvolume_cbk] 0-ipset-gv-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2014-03-26 07:55:48.986628] I [afr-common.c:3698:afr_notify] 0-ipset-gv-replicate-0: Subvolume 'ipset-gv-client-0' came back up; going online.
[2014-03-26 07:55:48.986743] I [client-handshake.c:450:client_set_lk_version_cbk] 0-ipset-gv-client-0: Server lk version = 1
[2014-03-26 07:55:52.975670] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-ipset-gv-client-1: changing port to 49152 (from 0)
[2014-03-26 07:55:52.975717] W [socket.c:514:__socket_rwv] 0-ipset-gv-client-1: readv failed (No data available)
[2014-03-26 07:55:52.978961] I [client-handshake.c:1659:select_server_supported_programs] 0-ipset-gv-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-03-26 07:55:52.979128] I [client-handshake.c:1456:client_setvolume_cbk] 0-ipset-gv-client-1: Connected to 192.168.1.181:49152, attached to remote volume '/usr/local/etc/ipset'.
[2014-03-26 07:55:52.979143] I [client-handshake.c:1468:client_setvolume_cbk] 0-ipset-gv-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2014-03-26 07:55:52.979269] I [client-handshake.c:450:client_set_lk_version_cbk] 0-ipset-gv-client-1: Server lk version = 1
[2014-03-26 07:55:52.980284] I [afr-self-heald.c:1180:afr_dir_exclusive_crawl] 0-ipset-gv-replicate-0: Another crawl is in progress for ipset-gv-client-1


And on the node that wasn't rebooted:

[root@ipset01 ~]# gluster peer status
Number of Peers: 1

Hostname: ipset02
Uuid: ff14ab0e-53cf-4015-9e49-fb60698c56db
State: Peer in Cluster (Disconnected)
[root@ipset01 ~]# gluster volume status
Status of volume: ipset-gv
Gluster process                        Port    Online    Pid
------------------------------------------------------------------------------
Brick ipset01:/usr/local/etc/ipset        49152    Y    1615
NFS Server on localhost                    2049    Y    2258
Self-heal Daemon on localhost                N/A    Y    2262
 
There are no active volume tasks

[root@ipset01 ~]# tail -3 /var/log/glusterfs/glustershd.log
[2014-03-26 07:50:28.881369] W [socket.c:514:__socket_rwv] 0-ipset-gv-client-1: readv failed (Connection reset by peer)
[2014-03-26 07:50:28.881421] W [socket.c:1962:__socket_proto_state_machine] 0-ipset-gv-client-1: reading from socket failed. Error (Connection reset by peer), peer (192.168.1.181:49152)
[2014-03-26 07:50:28.881463] I [client.c:2097:client_rpc_notify] 0-ipset-gv-client-1: disconnected

Howerver, it seems that files replicate fine on both nodes. After "service glusterd restart" on the first node (ipset01) "gluster peer status" is connected. This behavior is strange.

> May not be cause of your problems but it does bad things and gluster
> sees this as a 'crash' even with graceful shutdown

I have no /var/lock/subsys/glusterfsd file too, but there is /var/lock/subsys/glusterd. As far as I know new versions of GlusterFS use glusterd init file instead of glusterfsd.

[root@ipset01 etc]# service glusterfsd status
glusterfsd (pid 2338) is running...
[root@ipset01 etc]# service glusterd stop                  [  OK  ]
[root@ipset01 etc]# service glusterd status               
glusterd dead but subsys locked
[root@ipset01 etc]# service glusterfsd status
glusterfsd (pid 2338) is running...

Is it OK that glusterfsd still running?

2014-03-26 2:16 GMT+04:00 Viktor Villafuerte <viktor.villafuerte@xxxxxxxxxxxxxxx>:
Also see this bug
https://bugzilla.redhat.com/show_bug.cgi?id=1073217

May not be cause of your problems but it does bad things and gluster
sees this as a 'crash' even with graceful shutdown

v



On Tue 25 Mar 2014 22:24:22, Carlos Capriotti wrote:
> Let's go with the data collection first.
>
> What linux distro ?
>
> Anything special about your network configuration ?
>
> Any chance your server is taking too long to release networking and gluster
> is starting before network is ready ?
>
> Can you completely disable iptables and test again ?
>
> I am afraid quorum will not help you if you cannot get this issue
> corrected.
>
>
>
>
> On Tue, Mar 25, 2014 at 3:14 PM, Артём Конвалюк <artret@xxxxxxxxx> wrote:
>
> > Hello!
> >
> > I have 2 nodes with GlusterFS 3.4.2. I created one replica volume using 2
> > bricks and enabled glusterd autostarts. Also firewall is configured and I
> > have to run "iptables -F" on nodes after reboot. It is clear that firewall
> > should be disabled in LAN, but I'm interested in my case.
> >
> > Problem: When I reboot both nodes and run "iptables -F" peer status is
> > still disconnected. I wonder why. After "service glusterd restart" peer
> > status is connected. But I have to run "gluster volume heal <volume-name>"
> > to make both servers consistent and be able to replicate files. Is there
> > any way to eliminate this problem?
> >
> > I read about server-quorum, but it needs 3 or more nodes. Am I right?
> >
> > Best Regards,
> > Artem Konvalyuk
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users@xxxxxxxxxxx
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> >

> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users


--
Regards

Viktor Villafuerte
Optus Internet Engineering
t: 02 808-25265

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux