Re: question about sync replicate volume after rebooting one node

Atin Mukherjee <amukherj@xxxxxxxxxx> · Wed, 17 Feb 2016 21:17:08 +0530



On 02/17/2016 06:14 PM, songxin wrote:
> Do you mean that I will delete the info file on B node and then start the glusterd?Or copy it from A node to B node?
Any one of them and then a restart of GlusterD on B.
> 
> 发自我的 iPhone
> 
>> 在 2016年2月17日，14:59，Atin Mukherjee <amukherj@xxxxxxxxxx> 写道：
>>
>>
>>
>>> On 02/17/2016 11:44 AM, songxin wrote:
>>> Hi拢卢
>>> The version of glusterfs on  A node and B node are both 3.7.6.
>>> The time on B node is same after rebooting because B node hasn't RTC.
>>> Does it cause the problem?
>>>
>>> If I run " gluster volume start gv0 force " the glusterfsd can be
>>> started but "gluster volume start gv0" don't work.
>>>
>>> The file /var/lib/glusterd/vols/gv0/info on B node as below.
>>> ...
>>> type=2
>>> count=2
>>> status=1
>>> sub_count=2
>>> stripe_count=1
>>> replica_count=2
>>> disperse_count=0
>>> redundancy_count=0
>>> version=2
>>> transport-type=0
>>> volume-id=c4197371-6d01-4477-8cb2-384cda569c27
>>> username=62e009ea-47c4-46b4-8e74-47cd9c199d94
>>> password=ef600dcd-42c5-48fc-8004-d13a3102616b
>>> op-version=3
>>> client-op-version=3
>>> quota-version=0
>>> parent_volname=N/A
>>> restored_from_snap=00000000-0000-0000-0000-000000000000
>>> snap-max-hard-limit=256
>>> performance.readdir-ahead=on
>>> brick-0=128.224.162.255:-data-brick-gv0
>>> brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
>>>
>>> The file /var/lib/glusterd/vols/gv0/info on A node as below.
>>>
>>> wrsadmin@pek-song1-d1:~/work/tmp$ sudo cat /var/lib/glusterd/vols/gv0/info
>>> type=2
>>> count=2
>>> status=1
>>> sub_count=2
>>> stripe_count=1
>>> replica_count=2
>>> disperse_count=0
>>> redundancy_count=0
>>> version=2
>>> transport-type=0
>>> volume-id=c4197371-6d01-4477-8cb2-384cda569c27
>>> username=62e009ea-47c4-46b4-8e74-47cd9c199d94
>>> password=ef600dcd-42c5-48fc-8004-d13a3102616b
>>> op-version=3
>>> client-op-version=3
>>> quota-version=0
>>> parent_volname=N/A
>>> restored_from_snap=00000000-0000-0000-0000-000000000000
>>> snap-max-hard-limit=256
>>> performance.readdir-ahead=on
>>> brick-0=128.224.162.255:-data-brick-gv0
>>> brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
>> Contents look similar. But the log says different and that can'
>> t happen. Are you sure they are same? As a workaround can you delete the
>> same info file from the disk and restart glusterd instance and see
>> whether the problem persists?
>>>
>>> Thanks,
>>> Xin
>>>
>>>
>>> At 2016-02-17 12:01:37, "Atin Mukherjee" <amukherj@xxxxxxxxxx> wrote:
>>>>
>>>>
>>>>> On 02/17/2016 08:23 AM, songxin wrote:
>>>>> Hi,
>>>>> Thank you for your immediate and detailed reply.And I have a few more
>>>>> question about glusterfs. 
>>>>> A node IP is 128.224.162.163.
>>>>> B node IP is 128.224.162.250.
>>>>> 1.After reboot B node and start the glusterd service the glusterd log is
>>>>> as blow.
>>>>> ...
>>>>> [2015-12-07 07:54:55.743966] I [MSGID: 101190]
>>>>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
>>>>> with index 2
>>>>> [2015-12-07 07:54:55.744026] I [MSGID: 101190]
>>>>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
>>>>> with index 1
>>>>> [2015-12-07 07:54:55.744280] I [MSGID: 106163]
>>>>> [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
>>>>> 0-management: using the op-version 30706
>>>>> [2015-12-07 07:54:55.773606] I [MSGID: 106490]
>>>>> [glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req]
>>>>> 0-glusterd: Received probe from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
>>>>> [2015-12-07 07:54:55.777994] E [MSGID: 101076]
>>>>> [common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could not
>>>>> lookup hostname of 128.224.162.163 : Temporary failure in name resolution
>>>>> [2015-12-07 07:54:55.778290] E [MSGID: 106010]
>>>>> [glusterd-utils.c:2717:glusterd_compare_friend_volume] 0-management:
>>>>> Version of Cksums gv0 differ. local cksum = 2492237955, remote cksum =
>>>>> 4087388312 on peer 128.224.162.163
>>>> The above log entry is the reason of the rejection of the peer, most
>>>> probably its due to the compatibility issue. I believe the gluster
>>>> versions are different (share gluster versions from both the nodes) in
>>>> two nodes and you might have hit a bug.
>>>>
>>>> Can you share the delta of /var/lib/glusterd/vols/gv0/info file from
>>>> both the nodes?
>>>>
>>>>
>>>> ~Atin
>>>>> [2015-12-07 07:54:55.778384] I [MSGID: 106493]
>>>>> [glusterd-handler.c:3780:glusterd_xfer_friend_add_resp] 0-glusterd:
>>>>> Responded to 128.224.162.163 (0), ret: 0
>>>>> [2015-12-07 07:54:55.928774] I [MSGID: 106493]
>>>>> [glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received
>>>>> RJT from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44, host:
>>>>> 128.224.162.163, port: 0
>>>>> ...
>>>>> When I run gluster peer status on B node it show as below.
>>>>> Number of Peers: 1
>>>>>
>>>>> Hostname: 128.224.162.163
>>>>> Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
>>>>> State: Peer Rejected (Connected)
>>>>>
>>>>> When I run "gluster volume status" on A node  it show as below.
>>>>>
>>>>> Status of volume: gv0
>>>>> Gluster process                             TCP Port  RDMA Port  Online  Pid
>>>>> ------------------------------------------------------------------------------
>>>>> Brick 128.224.162.163:/home/wrsadmin/work/t
>>>>> mp/data/brick/gv0                           49152     0          Y      
>>>>> 13019
>>>>> NFS Server on localhost                     N/A       N/A        N      
>>>>> N/A  
>>>>> Self-heal Daemon on localhost               N/A       N/A        Y      
>>>>> 13045
>>>>>
>>>>> Task Status of Volume gv0
>>>>> ------------------------------------------------------------------------------
>>>>> There are no active volume tasks
>>>>>
>>>>> It looks like the glusterfsd service is ok on A node.
>>>>>
>>>>> If because the peer state is Rejected so gluterd didn't start the
>>>>> glusterfsd?What causes this problem拢驴
>>>>>
>>>>>
>>>>> 2. Is glustershd(self-heal-daemon) the process as below?
>>>>> root       497  0.8  0.0 432520 18104 ?        Ssl  08:07   0:00
>>>>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>>>>> /var/lib/glusterd/glustershd/run/gluster ..
>>>>>
>>>>> If it is拢卢 I want to know if the glustershd is also the bin glusterfsd拢卢
>>>>> just like glusterd and glusterfs.
>>>>>
>>>>> Thanks,
>>>>> Xin
>>>>>
>>>>>
>>>>> At 2016-02-16 18:53:03, "Anuradha Talur" <atalur@xxxxxxxxxx> wrote:
>>>>>>
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> From: "songxin" <songxin_1980@xxxxxxx>
>>>>>>> To: gluster-users@xxxxxxxxxxx
>>>>>>> Sent: Tuesday, February 16, 2016 3:59:50 PM
>>>>>>> Subject:  question about sync replicate volume after    rebooting one node
>>>>>>>
>>>>>>> Hi,
>>>>>>> I have a question about how to sync volume between two bricks after one node
>>>>>>> is reboot.
>>>>>>>
>>>>>>> There are two node, A node and B node.A node ip is 128.124.10.1 and B node ip
>>>>>>> is 128.124.10.2.
>>>>>>>
>>>>>>> operation steps on A node as below
>>>>>>> 1. gluster peer probe 128.124.10.2
>>>>>>> 2. mkdir -p /data/brick/gv0
>>>>>>> 3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0
>>>>>>> 128.124.10.2 :/data/brick/gv1 force
>>>>>>> 4. gluster volume start gv0
>>>>>>> 5.mount -t glusterfs 128.124.10.1 :/gv0 gluster
>>>>>>>
>>>>>>> operation steps on B node as below
>>>>>>> 1 . mkdir -p /data/brick/gv0
>>>>>>> 2.mount -t glusterfs 128.124.10.1 :/gv0 gluster
>>>>>>>
>>>>>>> After all steps above , there a some gluster service process, including
>>>>>>> glusterd, glusterfs and glusterfsd, running on both A and B node.
>>>>>>> I can see these servic by command ps aux | grep gluster and command gluster
>>>>>>> volume status.
>>>>>>>
>>>>>>> Now reboot the B node.After B reboot , there are no gluster service running
>>>>>>> on B node.
>>>>>>> After I systemctl start glusterd , there is just glusterd service but not
>>>>>>> glusterfs and glusterfsd on B node.
>>>>>>> Because glusterfs and glusterfsd are not running so I can't gluster volume
>>>>>>> heal gv0 full.
>>>>>>>
>>>>>>> I want to know why glusterd don't start glusterfs and glusterfsd.
>>>>>>
>>>>>> On starting glusterd, glusterfsd should have started by itself.
>>>>>> Could you share glusterd and brick log (on node B) so that we know why glusterfsd
>>>>>> didn't start?
>>>>>>
>>>>>> Do you still see glusterfsd service running on node A? You can try running "gluster v start <VOLNAME> force"
>>>>>> on one of the nodes and check if all the brick processes started.
>>>>>>
>>>>>> gluster volume status <VOLNAME> should be able to provide you with gluster process status.
>>>>>>
>>>>>> On restarting the node, glusterfs process for mount won't start by itself. You will have to run
>>>>>> step 2 on node B again for it.
>>>>>>
>>>>>>> How do I restart these services on B node?
>>>>>>> How do I sync the replicate volume after one node reboot?
>>>>>>
>>>>>> Once the glusterfsd process starts on node B too, glustershd -- self-heal-daemon -- for replicate volume
>>>>>> should start healing/syncing files that need to be synced. This deamon does periodic syncing of files.
>>>>>>
>>>>>> If you want to trigger a heal explicitly, you can run gluster volume heal <VOLNAME> on one of the servers.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Xin
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users@xxxxxxxxxxx
>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>> -- 
>>>>>> Thanks,
>>>>>> Anuradha.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users@xxxxxxxxxxx
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users