On 02/17/2016 12:23 PM, Atin Mukherjee wrote: > > > On 02/17/2016 12:08 PM, songxin wrote: >> >> Hi, >> But I also don't know why glusterfsd can't be start by glusterd after B >> node rebooted.The version of glusterfs on A node and B node are both >> 3.7.6. Can you explain this for me please? > Its because the GlusterD has failed to start on Node B. I've already > asked you in another mail to provide the delta of the gv0's info file to > get to the root cause. Please ignore this mail as I didn't read your previous reply! >> >> Thanks, >> Xin >> >> >> >> >> >> At 2016-02-17 14:30:21, "Anuradha Talur" <atalur@xxxxxxxxxx> wrote: >>> >>> >>> ----- Original Message ----- >>>> From: "songxin" <songxin_1980@xxxxxxx> >>>> To: "Atin Mukherjee" <amukherj@xxxxxxxxxx> >>>> Cc: "Anuradha Talur" <atalur@xxxxxxxxxx>, gluster-users@xxxxxxxxxxx >>>> Sent: Wednesday, February 17, 2016 11:44:14 AM >>>> Subject: Re:Re: question about sync replicate volume after rebooting one node >>>> >>>> Hi, >>>> The version of glusterfs on A node and B node are both 3.7.6. >>>> The time on B node is same after rebooting because B node hasn't RTC. Does it >>>> cause the problem? >>>> >>>> >>>> If I run " gluster volume start gv0 force " the glusterfsd can be started but >>>> "gluster volume start gv0" don't work. >>>> >>> Yes, there is a difference between volume start and volume start force. >>> When a volume is in "Started" state already, gluster volume start gv0 won't do >>> anything (meaning it doesn't bring up the dead bricks). When you say start force, >>> status of glusterfsd's is checked and the glusterfsd's not running are spawned. >>> Which is the case here in the setup you have. >>>> >>>> The file /var/lib/glusterd/vols/gv0/info on B node as below. >>>> ... >>>> type=2 >>>> count=2 >>>> status=1 >>>> sub_count=2 >>>> stripe_count=1 >>>> replica_count=2 >>>> disperse_count=0 >>>> redundancy_count=0 >>>> version=2 >>>> transport-type=0 >>>> volume-id=c4197371-6d01-4477-8cb2-384cda569c27 >>>> username=62e009ea-47c4-46b4-8e74-47cd9c199d94 >>>> password=ef600dcd-42c5-48fc-8004-d13a3102616b >>>> op-version=3 >>>> client-op-version=3 >>>> quota-version=0 >>>> parent_volname=N/A >>>> restored_from_snap=00000000-0000-0000-0000-000000000000 >>>> snap-max-hard-limit=256 >>>> performance.readdir-ahead=on >>>> brick-0=128.224.162.255:-data-brick-gv0 >>>> brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0 >>>> >>>> >>>> The file /var/lib/glusterd/vols/gv0/info on A node as below. >>>> >>>> >>>> wrsadmin@pek-song1-d1:~/work/tmp$ sudo cat /var/lib/glusterd/vols/gv0/info >>>> type=2 >>>> count=2 >>>> status=1 >>>> sub_count=2 >>>> stripe_count=1 >>>> replica_count=2 >>>> disperse_count=0 >>>> redundancy_count=0 >>>> version=2 >>>> transport-type=0 >>>> volume-id=c4197371-6d01-4477-8cb2-384cda569c27 >>>> username=62e009ea-47c4-46b4-8e74-47cd9c199d94 >>>> password=ef600dcd-42c5-48fc-8004-d13a3102616b >>>> op-version=3 >>>> client-op-version=3 >>>> quota-version=0 >>>> parent_volname=N/A >>>> restored_from_snap=00000000-0000-0000-0000-000000000000 >>>> snap-max-hard-limit=256 >>>> performance.readdir-ahead=on >>>> brick-0=128.224.162.255:-data-brick-gv0 >>>> brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0 >>>> >>>> >>>> Thanks, >>>> Xin >>>> >>>> At 2016-02-17 12:01:37, "Atin Mukherjee" <amukherj@xxxxxxxxxx> wrote: >>>>> >>>>> >>>>> On 02/17/2016 08:23 AM, songxin wrote: >>>>>> Hi, >>>>>> Thank you for your immediate and detailed reply.And I have a few more >>>>>> question about glusterfs. >>>>>> A node IP is 128.224.162.163. >>>>>> B node IP is 128.224.162.250. >>>>>> 1.After reboot B node and start the glusterd service the glusterd log is >>>>>> as blow. >>>>>> ... >>>>>> [2015-12-07 07:54:55.743966] I [MSGID: 101190] >>>>>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>>> with index 2 >>>>>> [2015-12-07 07:54:55.744026] I [MSGID: 101190] >>>>>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>>> with index 1 >>>>>> [2015-12-07 07:54:55.744280] I [MSGID: 106163] >>>>>> [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] >>>>>> 0-management: using the op-version 30706 >>>>>> [2015-12-07 07:54:55.773606] I [MSGID: 106490] >>>>>> [glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req] >>>>>> 0-glusterd: Received probe from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44 >>>>>> [2015-12-07 07:54:55.777994] E [MSGID: 101076] >>>>>> [common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could not >>>>>> lookup hostname of 128.224.162.163 : Temporary failure in name resolution >>>>>> [2015-12-07 07:54:55.778290] E [MSGID: 106010] >>>>>> [glusterd-utils.c:2717:glusterd_compare_friend_volume] 0-management: >>>>>> Version of Cksums gv0 differ. local cksum = 2492237955, remote cksum = >>>>>> 4087388312 on peer 128.224.162.163 >>>>> The above log entry is the reason of the rejection of the peer, most >>>>> probably its due to the compatibility issue. I believe the gluster >>>>> versions are different (share gluster versions from both the nodes) in >>>>> two nodes and you might have hit a bug. >>>>> >>>>> Can you share the delta of /var/lib/glusterd/vols/gv0/info file from >>>>> both the nodes? >>>>> >>>>> >>>>> ~Atin >>>>>> [2015-12-07 07:54:55.778384] I [MSGID: 106493] >>>>>> [glusterd-handler.c:3780:glusterd_xfer_friend_add_resp] 0-glusterd: >>>>>> Responded to 128.224.162.163 (0), ret: 0 >>>>>> [2015-12-07 07:54:55.928774] I [MSGID: 106493] >>>>>> [glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received >>>>>> RJT from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44, host: >>>>>> 128.224.162.163, port: 0 >>>>>> ... >>>>>> When I run gluster peer status on B node it show as below. >>>>>> Number of Peers: 1 >>>>>> >>>>>> Hostname: 128.224.162.163 >>>>>> Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44 >>>>>> State: Peer Rejected (Connected) >>>>>> >>>>>> When I run "gluster volume status" on A node it show as below. >>>>>> >>>>>> Status of volume: gv0 >>>>>> Gluster process TCP Port RDMA Port Online >>>>>> Pid >>>>>> ------------------------------------------------------------------------------ >>>>>> Brick 128.224.162.163:/home/wrsadmin/work/t >>>>>> mp/data/brick/gv0 49152 0 Y >>>>>> 13019 >>>>>> NFS Server on localhost N/A N/A N >>>>>> N/A >>>>>> Self-heal Daemon on localhost N/A N/A Y >>>>>> 13045 >>>>>> >>>>>> Task Status of Volume gv0 >>>>>> ------------------------------------------------------------------------------ >>>>>> There are no active volume tasks >>>>>> >>>>>> It looks like the glusterfsd service is ok on A node. >>>>>> >>>>>> If because the peer state is Rejected so gluterd didn't start the >>>>>> glusterfsd?What causes this problem? >>>>>> >>>>>> >>>>>> 2. Is glustershd(self-heal-daemon) the process as below? >>>>>> root 497 0.8 0.0 432520 18104 ? Ssl 08:07 0:00 >>>>>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>>>>> /var/lib/glusterd/glustershd/run/gluster .. >>>>>> >>>>>> If it is, I want to know if the glustershd is also the bin glusterfsd, >>>>>> just like glusterd and glusterfs. >>>>>> >>>>>> Thanks, >>>>>> Xin >>>>>> >>>>>> >>>>>> At 2016-02-16 18:53:03, "Anuradha Talur" <atalur@xxxxxxxxxx> wrote: >>>>>>> >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>>> From: "songxin" <songxin_1980@xxxxxxx> >>>>>>>> To: gluster-users@xxxxxxxxxxx >>>>>>>> Sent: Tuesday, February 16, 2016 3:59:50 PM >>>>>>>> Subject: question about sync replicate volume after >>>>>>>> rebooting one node >>>>>>>> >>>>>>>> Hi, >>>>>>>> I have a question about how to sync volume between two bricks after one >>>>>>>> node >>>>>>>> is reboot. >>>>>>>> >>>>>>>> There are two node, A node and B node.A node ip is 128.124.10.1 and B >>>>>>>> node ip >>>>>>>> is 128.124.10.2. >>>>>>>> >>>>>>>> operation steps on A node as below >>>>>>>> 1. gluster peer probe 128.124.10.2 >>>>>>>> 2. mkdir -p /data/brick/gv0 >>>>>>>> 3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0 >>>>>>>> 128.124.10.2 :/data/brick/gv1 force >>>>>>>> 4. gluster volume start gv0 >>>>>>>> 5.mount -t glusterfs 128.124.10.1 :/gv0 gluster >>>>>>>> >>>>>>>> operation steps on B node as below >>>>>>>> 1 . mkdir -p /data/brick/gv0 >>>>>>>> 2.mount -t glusterfs 128.124.10.1 :/gv0 gluster >>>>>>>> >>>>>>>> After all steps above , there a some gluster service process, including >>>>>>>> glusterd, glusterfs and glusterfsd, running on both A and B node. >>>>>>>> I can see these servic by command ps aux | grep gluster and command >>>>>>>> gluster >>>>>>>> volume status. >>>>>>>> >>>>>>>> Now reboot the B node.After B reboot , there are no gluster service >>>>>>>> running >>>>>>>> on B node. >>>>>>>> After I systemctl start glusterd , there is just glusterd service but >>>>>>>> not >>>>>>>> glusterfs and glusterfsd on B node. >>>>>>>> Because glusterfs and glusterfsd are not running so I can't gluster >>>>>>>> volume >>>>>>>> heal gv0 full. >>>>>>>> >>>>>>>> I want to know why glusterd don't start glusterfs and glusterfsd. >>>>>>> >>>>>>> On starting glusterd, glusterfsd should have started by itself. >>>>>>> Could you share glusterd and brick log (on node B) so that we know why >>>>>>> glusterfsd >>>>>>> didn't start? >>>>>>> >>>>>>> Do you still see glusterfsd service running on node A? You can try running >>>>>>> "gluster v start <VOLNAME> force" >>>>>>> on one of the nodes and check if all the brick processes started. >>>>>>> >>>>>>> gluster volume status <VOLNAME> should be able to provide you with gluster >>>>>>> process status. >>>>>>> >>>>>>> On restarting the node, glusterfs process for mount won't start by itself. >>>>>>> You will have to run >>>>>>> step 2 on node B again for it. >>>>>>> >>>>>>>> How do I restart these services on B node? >>>>>>>> How do I sync the replicate volume after one node reboot? >>>>>>> >>>>>>> Once the glusterfsd process starts on node B too, glustershd -- >>>>>>> self-heal-daemon -- for replicate volume >>>>>>> should start healing/syncing files that need to be synced. This deamon >>>>>>> does periodic syncing of files. >>>>>>> >>>>>>> If you want to trigger a heal explicitly, you can run gluster volume heal >>>>>>> <VOLNAME> on one of the servers. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Xin >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users@xxxxxxxxxxx >>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>>> -- >>>>>>> Thanks, >>>>>>> Anuradha. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users@xxxxxxxxxxx >>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>> >>> >>> -- >>> Thanks, >>> Anuradha. >> >> >> >> >> > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users