Re: GlusterFS cluster of 2 nodes is disconnected after nodes reboot

Артём Конвалюк <artret@xxxxxxxxx> · Wed, 26 Mar 2014 12:40:31 +0400

> What linux distro ?

>

> Anything special about your network configuration ?

>

> Any chance your server is taking too long to release networking and gluster

> is starting before network is ready ?

>

> Can you completely disable iptables and test again ?

Both nodes are CentOS 6.5 VMs running on VMware ESXi 5.5.0. There is nothing special about network configuration, just static IPs. Ping and ssh works fine. I added "iptables -F" to /etc/rc.local. After simulteneous reboot "gluster peer status" on both nodes is connected and replication works fine. But "gluster volume status" states that NFS server and self-heal daemon on one of them isn't running. So I need to restart glusterd to make them running.

Another issue: when everything is OK after "service glusterd restart" on both nodes, I reboot one node and then can see on the rebooted node (ipset02):

[root@ipset02 etc]# gluster peer status
Number of Peers: 1

Hostname: ipset01
Uuid: 6313a4dd-f736-46ff-9836-bdf05c886ffd
State: Peer in Cluster (Connected)
[root@ipset02 etc]# gluster volume status

Status of volume: ipset-gv
Gluster process                        Port    Online    Pid
------------------------------------------------------------------------------
Brick ipset01:/usr/local/etc/ipset        49152    Y    1615

Brick ipset02:/usr/local/etc/ipset        49152    Y    2282
NFS Server on localhost                    2049    Y    2289
Self-heal Daemon on localhost                N/A    Y    2296
NFS Server on ipset01            2049    Y    2258

Self-heal Daemon on ipset01       N/A    Y    2262

There are no active volume tasks

[root@ipset02 etc]# tail -17 /var/log/glusterfs/glustershd.log 
[2014-03-26 07:55:48.982456] E [client-handshake.c:1742:client_query_portmap_cbk] 0-ipset-gv-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.

[2014-03-26 07:55:48.982532] W [socket.c:514:__socket_rwv] 0-ipset-gv-client-1: readv failed (No data available)
[2014-03-26 07:55:48.982555] I [client.c:2097:client_rpc_notify] 0-ipset-gv-client-1: disconnected
[2014-03-26 07:55:48.982572] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-ipset-gv-client-0: changing port to 49152 (from 0)

[2014-03-26 07:55:48.982627] W [socket.c:514:__socket_rwv] 0-ipset-gv-client-0: readv failed (No data available)
[2014-03-26 07:55:48.986252] I [client-handshake.c:1659:select_server_supported_programs] 0-ipset-gv-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)

[2014-03-26 07:55:48.986551] I [client-handshake.c:1456:client_setvolume_cbk] 0-ipset-gv-client-0: Connected to 192.168.1.180:49152, attached to remote volume '/usr/local/etc/ipset'.

[2014-03-26 07:55:48.986566] I [client-handshake.c:1468:client_setvolume_cbk] 0-ipset-gv-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2014-03-26 07:55:48.986628] I [afr-common.c:3698:afr_notify] 0-ipset-gv-replicate-0: Subvolume 'ipset-gv-client-0' came back up; going online.

[2014-03-26 07:55:48.986743] I [client-handshake.c:450:client_set_lk_version_cbk] 0-ipset-gv-client-0: Server lk version = 1
[2014-03-26 07:55:52.975670] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-ipset-gv-client-1: changing port to 49152 (from 0)

[2014-03-26 07:55:52.975717] W [socket.c:514:__socket_rwv] 0-ipset-gv-client-1: readv failed (No data available)
[2014-03-26 07:55:52.978961] I [client-handshake.c:1659:select_server_supported_programs] 0-ipset-gv-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)

[2014-03-26 07:55:52.979128] I [client-handshake.c:1456:client_setvolume_cbk] 0-ipset-gv-client-1: Connected to 192.168.1.181:49152, attached to remote volume '/usr/local/etc/ipset'.

[2014-03-26 07:55:52.979143] I [client-handshake.c:1468:client_setvolume_cbk] 0-ipset-gv-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2014-03-26 07:55:52.979269] I [client-handshake.c:450:client_set_lk_version_cbk] 0-ipset-gv-client-1: Server lk version = 1

[2014-03-26 07:55:52.980284] I [afr-self-heald.c:1180:afr_dir_exclusive_crawl] 0-ipset-gv-replicate-0: Another crawl is in progress for ipset-gv-client-1

And on the node that wasn't rebooted:

[root@ipset01 ~]# gluster peer status
Number of Peers: 1

Hostname: ipset02
Uuid: ff14ab0e-53cf-4015-9e49-fb60698c56db
State: Peer in Cluster (Disconnected)

[root@ipset01 ~]# gluster volume status
Status of volume: ipset-gv
Gluster process                        Port    Online    Pid
------------------------------------------------------------------------------

Brick ipset01:/usr/local/etc/ipset        49152    Y    1615
NFS Server on localhost                    2049    Y    2258
Self-heal Daemon on localhost                N/A    Y    2262

There are no active volume tasks

[root@ipset01 ~]# tail -3 /var/log/glusterfs/glustershd.log 
[2014-03-26 07:50:28.881369] W [socket.c:514:__socket_rwv] 0-ipset-gv-client-1: readv failed (Connection reset by peer)

[2014-03-26 07:50:28.881421] W [socket.c:1962:__socket_proto_state_machine] 0-ipset-gv-client-1: reading from socket failed. Error (Connection reset by peer), peer (192.168.1.181:49152)

[2014-03-26 07:50:28.881463] I [client.c:2097:client_rpc_notify] 0-ipset-gv-client-1: disconnected

Howerver, it seems that files replicate fine on both nodes. After "service glusterd restart" on the first node (ipset01) "gluster peer status" is connected. This behavior is strange.

> May not be cause of your problems but it does bad things and gluster

> sees this as a 'crash' even with graceful shutdown

I have no /var/lock/subsys/glusterfsd file too, but there is /var/lock/subsys/glusterd. As far as I know new versions of GlusterFS use glusterd init file instead of glusterfsd.

[root@ipset01 etc]# service glusterfsd status
glusterfsd (pid 2338) is running...
[root@ipset01 etc]# service glusterd stop                  [  OK  ]
[root@ipset01 etc]# service glusterd status                

glusterd dead but subsys locked
[root@ipset01 etc]# service glusterfsd status
glusterfsd (pid 2338) is running...

Is it OK that glusterfsd still running?

2014-03-26 2:16 GMT+04:00 Viktor Villafuerte <viktor.villafuerte@xxxxxxxxxxxxxxx>:

Also see this bug

https://bugzilla.redhat.com/show_bug.cgi?id=1073217

May not be cause of your problems but it does bad things and gluster

sees this as a 'crash' even with graceful shutdown

v

On Tue 25 Mar 2014 22:24:22, Carlos Capriotti wrote:

> Let's go with the data collection first.

>

> What linux distro ?

>

> Anything special about your network configuration ?

>

> Any chance your server is taking too long to release networking and gluster

> is starting before network is ready ?

>

> Can you completely disable iptables and test again ?

>

> I am afraid quorum will not help you if you cannot get this issue

> corrected.

>

>

>

>

> On Tue, Mar 25, 2014 at 3:14 PM, Артём Конвалюк <artret@xxxxxxxxx> wrote:

>

> > Hello!

> >

> > I have 2 nodes with GlusterFS 3.4.2. I created one replica volume using 2

> > bricks and enabled glusterd autostarts. Also firewall is configured and I

> > have to run "iptables -F" on nodes after reboot. It is clear that firewall

> > should be disabled in LAN, but I'm interested in my case.

> >

> > Problem: When I reboot both nodes and run "iptables -F" peer status is

> > still disconnected. I wonder why. After "service glusterd restart" peer

> > status is connected. But I have to run "gluster volume heal <volume-name>"

> > to make both servers consistent and be able to replicate files. Is there

> > any way to eliminate this problem?

> >

> > I read about server-quorum, but it needs 3 or more nodes. Am I right?

> >

> > Best Regards,

> > Artem Konvalyuk

> >

> > _______________________________________________

> > Gluster-users mailing list

> > Gluster-users@xxxxxxxxxxx

> > http://supercolony.gluster.org/mailman/listinfo/gluster-users

> >

> _______________________________________________

> Gluster-users mailing list

> Gluster-users@xxxxxxxxxxx

> http://supercolony.gluster.org/mailman/listinfo/gluster-users

--

Regards

Viktor Villafuerte

Optus Internet Engineering

t: 02 808-25265

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users