On 02/17/2016 11:44 AM, songxin wrote: > Hi, > The version of glusterfs on A node and B node are both 3.7.6. > The time on B node is same after rebooting because B node hasn't RTC. > Does it cause the problem? > > If I run " gluster volume start gv0 force " the glusterfsd can be > started but "gluster volume start gv0" don't work. > > The file /var/lib/glusterd/vols/gv0/info on B node as below. > ... > type=2 > count=2 > status=1 > sub_count=2 > stripe_count=1 > replica_count=2 > disperse_count=0 > redundancy_count=0 > version=2 > transport-type=0 > volume-id=c4197371-6d01-4477-8cb2-384cda569c27 > username=62e009ea-47c4-46b4-8e74-47cd9c199d94 > password=ef600dcd-42c5-48fc-8004-d13a3102616b > op-version=3 > client-op-version=3 > quota-version=0 > parent_volname=N/A > restored_from_snap=00000000-0000-0000-0000-000000000000 > snap-max-hard-limit=256 > performance.readdir-ahead=on > brick-0=128.224.162.255:-data-brick-gv0 > brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0 > > The file /var/lib/glusterd/vols/gv0/info on A node as below. > > wrsadmin@pek-song1-d1:~/work/tmp$ sudo cat /var/lib/glusterd/vols/gv0/info > type=2 > count=2 > status=1 > sub_count=2 > stripe_count=1 > replica_count=2 > disperse_count=0 > redundancy_count=0 > version=2 > transport-type=0 > volume-id=c4197371-6d01-4477-8cb2-384cda569c27 > username=62e009ea-47c4-46b4-8e74-47cd9c199d94 > password=ef600dcd-42c5-48fc-8004-d13a3102616b > op-version=3 > client-op-version=3 > quota-version=0 > parent_volname=N/A > restored_from_snap=00000000-0000-0000-0000-000000000000 > snap-max-hard-limit=256 > performance.readdir-ahead=on > brick-0=128.224.162.255:-data-brick-gv0 > brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0 Contents look similar. But the log says different and that can' t happen. Are you sure they are same? As a workaround can you delete the same info file from the disk and restart glusterd instance and see whether the problem persists? > > Thanks, > Xin > > > At 2016-02-17 12:01:37, "Atin Mukherjee" <amukherj@xxxxxxxxxx> wrote: >> >> >>On 02/17/2016 08:23 AM, songxin wrote: >>> Hi, >>> Thank you for your immediate and detailed reply.And I have a few more >>> question about glusterfs. >>> A node IP is 128.224.162.163. >>> B node IP is 128.224.162.250. >>> 1.After reboot B node and start the glusterd service the glusterd log is >>> as blow. >>> ... >>> [2015-12-07 07:54:55.743966] I [MSGID: 101190] >>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >>> with index 2 >>> [2015-12-07 07:54:55.744026] I [MSGID: 101190] >>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >>> with index 1 >>> [2015-12-07 07:54:55.744280] I [MSGID: 106163] >>> [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] >>> 0-management: using the op-version 30706 >>> [2015-12-07 07:54:55.773606] I [MSGID: 106490] >>> [glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req] >>> 0-glusterd: Received probe from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44 >>> [2015-12-07 07:54:55.777994] E [MSGID: 101076] >>> [common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could not >>> lookup hostname of 128.224.162.163 : Temporary failure in name resolution >>> [2015-12-07 07:54:55.778290] E [MSGID: 106010] >>> [glusterd-utils.c:2717:glusterd_compare_friend_volume] 0-management: >>> Version of Cksums gv0 differ. local cksum = 2492237955, remote cksum = >>> 4087388312 on peer 128.224.162.163 >>The above log entry is the reason of the rejection of the peer, most >>probably its due to the compatibility issue. I believe the gluster >>versions are different (share gluster versions from both the nodes) in >>two nodes and you might have hit a bug. >> >>Can you share the delta of /var/lib/glusterd/vols/gv0/info file from >>both the nodes? >> >> >>~Atin >>> [2015-12-07 07:54:55.778384] I [MSGID: 106493] >>> [glusterd-handler.c:3780:glusterd_xfer_friend_add_resp] 0-glusterd: >>> Responded to 128.224.162.163 (0), ret: 0 >>> [2015-12-07 07:54:55.928774] I [MSGID: 106493] >>> [glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received >>> RJT from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44, host: >>> 128.224.162.163, port: 0 >>> ... >>> When I run gluster peer status on B node it show as below. >>> Number of Peers: 1 >>> >>> Hostname: 128.224.162.163 >>> Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44 >>> State: Peer Rejected (Connected) >>> >>> When I run "gluster volume status" on A node it show as below. >>> >>> Status of volume: gv0 >>> Gluster process TCP Port RDMA Port Online Pid >>> ------------------------------------------------------------------------------ >>> Brick 128.224.162.163:/home/wrsadmin/work/t >>> mp/data/brick/gv0 49152 0 Y >>> 13019 >>> NFS Server on localhost N/A N/A N >>> N/A >>> Self-heal Daemon on localhost N/A N/A Y >>> 13045 >>> >>> Task Status of Volume gv0 >>> ------------------------------------------------------------------------------ >>> There are no active volume tasks >>> >>> It looks like the glusterfsd service is ok on A node. >>> >>> If because the peer state is Rejected so gluterd didn't start the >>> glusterfsd?What causes this problem? >>> >>> >>> 2. Is glustershd(self-heal-daemon) the process as below? >>> root 497 0.8 0.0 432520 18104 ? Ssl 08:07 0:00 >>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>> /var/lib/glusterd/glustershd/run/gluster .. >>> >>> If it is, I want to know if the glustershd is also the bin glusterfsd, >>> just like glusterd and glusterfs. >>> >>> Thanks, >>> Xin >>> >>> >>> At 2016-02-16 18:53:03, "Anuradha Talur" <atalur@xxxxxxxxxx> wrote: >>>> >>>> >>>>----- Original Message ----- >>>>> From: "songxin" <songxin_1980@xxxxxxx> >>>>> To: gluster-users@xxxxxxxxxxx >>>>> Sent: Tuesday, February 16, 2016 3:59:50 PM >>>>> Subject: question about sync replicate volume after rebooting one node >>>>> >>>>> Hi, >>>>> I have a question about how to sync volume between two bricks after one node >>>>> is reboot. >>>>> >>>>> There are two node, A node and B node.A node ip is 128.124.10.1 and B node ip >>>>> is 128.124.10.2. >>>>> >>>>> operation steps on A node as below >>>>> 1. gluster peer probe 128.124.10.2 >>>>> 2. mkdir -p /data/brick/gv0 >>>>> 3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0 >>>>> 128.124.10.2 :/data/brick/gv1 force >>>>> 4. gluster volume start gv0 >>>>> 5.mount -t glusterfs 128.124.10.1 :/gv0 gluster >>>>> >>>>> operation steps on B node as below >>>>> 1 . mkdir -p /data/brick/gv0 >>>>> 2.mount -t glusterfs 128.124.10.1 :/gv0 gluster >>>>> >>>>> After all steps above , there a some gluster service process, including >>>>> glusterd, glusterfs and glusterfsd, running on both A and B node. >>>>> I can see these servic by command ps aux | grep gluster and command gluster >>>>> volume status. >>>>> >>>>> Now reboot the B node.After B reboot , there are no gluster service running >>>>> on B node. >>>>> After I systemctl start glusterd , there is just glusterd service but not >>>>> glusterfs and glusterfsd on B node. >>>>> Because glusterfs and glusterfsd are not running so I can't gluster volume >>>>> heal gv0 full. >>>>> >>>>> I want to know why glusterd don't start glusterfs and glusterfsd. >>>> >>>>On starting glusterd, glusterfsd should have started by itself. >>>>Could you share glusterd and brick log (on node B) so that we know why glusterfsd >>>>didn't start? >>>> >>>>Do you still see glusterfsd service running on node A? You can try running "gluster v start <VOLNAME> force" >>>>on one of the nodes and check if all the brick processes started. >>>> >>>>gluster volume status <VOLNAME> should be able to provide you with gluster process status. >>>> >>>>On restarting the node, glusterfs process for mount won't start by itself. You will have to run >>>>step 2 on node B again for it. >>>> >>>>> How do I restart these services on B node? >>>>> How do I sync the replicate volume after one node reboot? >>>> >>>>Once the glusterfsd process starts on node B too, glustershd -- self-heal-daemon -- for replicate volume >>>>should start healing/syncing files that need to be synced. This deamon does periodic syncing of files. >>>> >>>>If you want to trigger a heal explicitly, you can run gluster volume heal <VOLNAME> on one of the servers. >>>>> >>>>> Thanks, >>>>> Xin >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users@xxxxxxxxxxx >>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>> >>>>-- >>>>Thanks, >>>>Anuradha. >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users@xxxxxxxxxxx >>> http://www.gluster.org/mailman/listinfo/gluster-users >>> > > > > > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users