Re: question about sync replicate volume after rebooting one node

Atin Mukherjee <amukherj@xxxxxxxxxx> · Wed, 17 Feb 2016 09:31:37 +0530

On 02/17/2016 08:23 AM, songxin wrote:
> Hi,
> Thank you for your immediate and detailed reply.And I have a few more
> question about glusterfs. 
> A node IP is 128.224.162.163.
> B node IP is 128.224.162.250.
> 1.After reboot B node and start the glusterd service the glusterd log is
> as blow.
> ...
> [2015-12-07 07:54:55.743966] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 2
> [2015-12-07 07:54:55.744026] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2015-12-07 07:54:55.744280] I [MSGID: 106163]
> [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
> 0-management: using the op-version 30706
> [2015-12-07 07:54:55.773606] I [MSGID: 106490]
> [glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req]
> 0-glusterd: Received probe from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
> [2015-12-07 07:54:55.777994] E [MSGID: 101076]
> [common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could not
> lookup hostname of 128.224.162.163 : Temporary failure in name resolution
> [2015-12-07 07:54:55.778290] E [MSGID: 106010]
> [glusterd-utils.c:2717:glusterd_compare_friend_volume] 0-management:
> Version of Cksums gv0 differ. local cksum = 2492237955, remote cksum =
> 4087388312 on peer 128.224.162.163
The above log entry is the reason of the rejection of the peer, most
probably its due to the compatibility issue. I believe the gluster
versions are different (share gluster versions from both the nodes) in
two nodes and you might have hit a bug.

Can you share the delta of /var/lib/glusterd/vols/gv0/info file from
both the nodes?

~Atin
> [2015-12-07 07:54:55.778384] I [MSGID: 106493]
> [glusterd-handler.c:3780:glusterd_xfer_friend_add_resp] 0-glusterd:
> Responded to 128.224.162.163 (0), ret: 0
> [2015-12-07 07:54:55.928774] I [MSGID: 106493]
> [glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received
> RJT from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44, host:
> 128.224.162.163, port: 0
> ...
> When I run gluster peer status on B node it show as below.
> Number of Peers: 1
> 
> Hostname: 128.224.162.163
> Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
> State: Peer Rejected (Connected)
> 
> When I run "gluster volume status" on A node  it show as below.
>  
> Status of volume: gv0
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> ------------------------------------------------------------------------------
> Brick 128.224.162.163:/home/wrsadmin/work/t
> mp/data/brick/gv0                           49152     0          Y      
> 13019
> NFS Server on localhost                     N/A       N/A        N      
> N/A  
> Self-heal Daemon on localhost               N/A       N/A        Y      
> 13045
>  
> Task Status of Volume gv0
> ------------------------------------------------------------------------------
> There are no active volume tasks
> 
> It looks like the glusterfsd service is ok on A node.
> 
> If because the peer state is Rejected so gluterd didn't start the
> glusterfsd?What causes this problem？
> 
> 
> 2. Is glustershd(self-heal-daemon) the process as below?
> root       497  0.8  0.0 432520 18104 ?        Ssl  08:07   0:00
> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
> /var/lib/glusterd/glustershd/run/gluster ..
> 
> If it is， I want to know if the glustershd is also the bin glusterfsd，
> just like glusterd and glusterfs.
> 
> Thanks,
> Xin
> 
> 
> At 2016-02-16 18:53:03, "Anuradha Talur" <atalur@xxxxxxxxxx> wrote:
>>
>>
>>----- Original Message -----
>>> From: "songxin" <songxin_1980@xxxxxxx>
>>> To: gluster-users@xxxxxxxxxxx
>>> Sent: Tuesday, February 16, 2016 3:59:50 PM
>>> Subject:  question about sync replicate volume after	rebooting one node
>>> 
>>> Hi,
>>> I have a question about how to sync volume between two bricks after one node
>>> is reboot.
>>> 
>>> There are two node, A node and B node.A node ip is 128.124.10.1 and B node ip
>>> is 128.124.10.2.
>>> 
>>> operation steps on A node as below
>>> 1. gluster peer probe 128.124.10.2
>>> 2. mkdir -p /data/brick/gv0
>>> 3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0
>>> 128.124.10.2 :/data/brick/gv1 force
>>> 4. gluster volume start gv0
>>> 5.mount -t glusterfs 128.124.10.1 :/gv0 gluster
>>> 
>>> operation steps on B node as below
>>> 1 . mkdir -p /data/brick/gv0
>>> 2.mount -t glusterfs 128.124.10.1 :/gv0 gluster
>>> 
>>> After all steps above , there a some gluster service process, including
>>> glusterd, glusterfs and glusterfsd, running on both A and B node.
>>> I can see these servic by command ps aux | grep gluster and command gluster
>>> volume status.
>>> 
>>> Now reboot the B node.After B reboot , there are no gluster service running
>>> on B node.
>>> After I systemctl start glusterd , there is just glusterd service but not
>>> glusterfs and glusterfsd on B node.
>>> Because glusterfs and glusterfsd are not running so I can't gluster volume
>>> heal gv0 full.
>>> 
>>> I want to know why glusterd don't start glusterfs and glusterfsd.
>>
>>On starting glusterd, glusterfsd should have started by itself.
>>Could you share glusterd and brick log (on node B) so that we know why glusterfsd
>>didn't start?
>>
>>Do you still see glusterfsd service running on node A? You can try running "gluster v start <VOLNAME> force"
>>on one of the nodes and check if all the brick processes started.
>>
>>gluster volume status <VOLNAME> should be able to provide you with gluster process status.
>>
>>On restarting the node, glusterfs process for mount won't start by itself. You will have to run
>>step 2 on node B again for it.
>>
>>> How do I restart these services on B node?
>>> How do I sync the replicate volume after one node reboot?
>>
>>Once the glusterfsd process starts on node B too, glustershd -- self-heal-daemon -- for replicate volume
>>should start healing/syncing files that need to be synced. This deamon does periodic syncing of files.
>>
>>If you want to trigger a heal explicitly, you can run gluster volume heal <VOLNAME> on one of the servers.
>>> 
>>> Thanks,
>>> Xin
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users@xxxxxxxxxxx
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>-- 
>>Thanks,
>>Anuradha.
> 
> 
> 
>  
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users