Re: question about sync replicate volume after rebooting one node

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi,
But I also don't know why glusterfsd can't be start by glusterd after B node rebooted.The version of glusterfs on A node and B node are both 3.7.6. Can you explain this for me please?

Thanks,
Xin





At 2016-02-17 14:30:21, "Anuradha Talur" <atalur@xxxxxxxxxx> wrote: > > >----- Original Message ----- >> From: "songxin" <songxin_1980@xxxxxxx> >> To: "Atin Mukherjee" <amukherj@xxxxxxxxxx> >> Cc: "Anuradha Talur" <atalur@xxxxxxxxxx>, gluster-users@xxxxxxxxxxx >> Sent: Wednesday, February 17, 2016 11:44:14 AM >> Subject: Re:Re: [Gluster-users] question about sync replicate volume after rebooting one node >> >> Hi, >> The version of glusterfs on A node and B node are both 3.7.6. >> The time on B node is same after rebooting because B node hasn't RTC. Does it >> cause the problem? >> >> >> If I run " gluster volume start gv0 force " the glusterfsd can be started but >> "gluster volume start gv0" don't work. >> >Yes, there is a difference between volume start and volume start force. >When a volume is in "Started" state already, gluster volume start gv0 won't do >anything (meaning it doesn't bring up the dead bricks). When you say start force, >status of glusterfsd's is checked and the glusterfsd's not running are spawned. >Which is the case here in the setup you have. >> >> The file /var/lib/glusterd/vols/gv0/info on B node as below. >> ... >> type=2 >> count=2 >> status=1 >> sub_count=2 >> stripe_count=1 >> replica_count=2 >> disperse_count=0 >> redundancy_count=0 >> version=2 >> transport-type=0 >> volume-id=c4197371-6d01-4477-8cb2-384cda569c27 >> username=62e009ea-47c4-46b4-8e74-47cd9c199d94 >> password=ef600dcd-42c5-48fc-8004-d13a3102616b >> op-version=3 >> client-op-version=3 >> quota-version=0 >> parent_volname=N/A >> restored_from_snap=00000000-0000-0000-0000-000000000000 >> snap-max-hard-limit=256 >> performance.readdir-ahead=on >> brick-0=128.224.162.255:-data-brick-gv0 >> brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0 >> >> >> The file /var/lib/glusterd/vols/gv0/info on A node as below. >> >> >> wrsadmin@pek-song1-d1:~/work/tmp$ sudo cat /var/lib/glusterd/vols/gv0/info >> type=2 >> count=2 >> status=1 >> sub_count=2 >> stripe_count=1 >> replica_count=2 >> disperse_count=0 >> redundancy_count=0 >> version=2 >> transport-type=0 >> volume-id=c4197371-6d01-4477-8cb2-384cda569c27 >> username=62e009ea-47c4-46b4-8e74-47cd9c199d94 >> password=ef600dcd-42c5-48fc-8004-d13a3102616b >> op-version=3 >> client-op-version=3 >> quota-version=0 >> parent_volname=N/A >> restored_from_snap=00000000-0000-0000-0000-000000000000 >> snap-max-hard-limit=256 >> performance.readdir-ahead=on >> brick-0=128.224.162.255:-data-brick-gv0 >> brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0 >> >> >> Thanks, >> Xin >> >> At 2016-02-17 12:01:37, "Atin Mukherjee" <amukherj@xxxxxxxxxx> wrote: >> > >> > >> >On 02/17/2016 08:23 AM, songxin wrote: >> >> Hi, >> >> Thank you for your immediate and detailed reply.And I have a few more >> >> question about glusterfs. >> >> A node IP is 128.224.162.163. >> >> B node IP is 128.224.162.250. >> >> 1.After reboot B node and start the glusterd service the glusterd log is >> >> as blow. >> >> ... >> >> [2015-12-07 07:54:55.743966] I [MSGID: 101190] >> >> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >> >> with index 2 >> >> [2015-12-07 07:54:55.744026] I [MSGID: 101190] >> >> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >> >> with index 1 >> >> [2015-12-07 07:54:55.744280] I [MSGID: 106163] >> >> [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] >> >> 0-management: using the op-version 30706 >> >> [2015-12-07 07:54:55.773606] I [MSGID: 106490] >> >> [glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req] >> >> 0-glusterd: Received probe from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44 >> >> [2015-12-07 07:54:55.777994] E [MSGID: 101076] >> >> [common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could not >> >> lookup hostname of 128.224.162.163 : Temporary failure in name resolution >> >> [2015-12-07 07:54:55.778290] E [MSGID: 106010] >> >> [glusterd-utils.c:2717:glusterd_compare_friend_volume] 0-management: >> >> Version of Cksums gv0 differ. local cksum = 2492237955, remote cksum = >> >> 4087388312 on peer 128.224.162.163 >> >The above log entry is the reason of the rejection of the peer, most >> >probably its due to the compatibility issue. I believe the gluster >> >versions are different (share gluster versions from both the nodes) in >> >two nodes and you might have hit a bug. >> > >> >Can you share the delta of /var/lib/glusterd/vols/gv0/info file from >> >both the nodes? >> > >> > >> >~Atin >> >> [2015-12-07 07:54:55.778384] I [MSGID: 106493] >> >> [glusterd-handler.c:3780:glusterd_xfer_friend_add_resp] 0-glusterd: >> >> Responded to 128.224.162.163 (0), ret: 0 >> >> [2015-12-07 07:54:55.928774] I [MSGID: 106493] >> >> [glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received >> >> RJT from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44, host: >> >> 128.224.162.163, port: 0 >> >> ... >> >> When I run gluster peer status on B node it show as below. >> >> Number of Peers: 1 >> >> >> >> Hostname: 128.224.162.163 >> >> Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44 >> >> State: Peer Rejected (Connected) >> >> >> >> When I run "gluster volume status" on A node it show as below. >> >> >> >> Status of volume: gv0 >> >> Gluster process TCP Port RDMA Port Online >> >> Pid >> >> ------------------------------------------------------------------------------ >> >> Brick 128.224.162.163:/home/wrsadmin/work/t >> >> mp/data/brick/gv0 49152 0 Y >> >> 13019 >> >> NFS Server on localhost N/A N/A N >> >> N/A >> >> Self-heal Daemon on localhost N/A N/A Y >> >> 13045 >> >> >> >> Task Status of Volume gv0 >> >> ------------------------------------------------------------------------------ >> >> There are no active volume tasks >> >> >> >> It looks like the glusterfsd service is ok on A node. >> >> >> >> If because the peer state is Rejected so gluterd didn't start the >> >> glusterfsd?What causes this problem? >> >> >> >> >> >> 2. Is glustershd(self-heal-daemon) the process as below? >> >> root 497 0.8 0.0 432520 18104 ? Ssl 08:07 0:00 >> >> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >> >> /var/lib/glusterd/glustershd/run/gluster .. >> >> >> >> If it is, I want to know if the glustershd is also the bin glusterfsd, >> >> just like glusterd and glusterfs. >> >> >> >> Thanks, >> >> Xin >> >> >> >> >> >> At 2016-02-16 18:53:03, "Anuradha Talur" <atalur@xxxxxxxxxx> wrote: >> >>> >> >>> >> >>>----- Original Message ----- >> >>>> From: "songxin" <songxin_1980@xxxxxxx> >> >>>> To: gluster-users@xxxxxxxxxxx >> >>>> Sent: Tuesday, February 16, 2016 3:59:50 PM >> >>>> Subject: [Gluster-users] question about sync replicate volume after >> >>>> rebooting one node >> >>>> >> >>>> Hi, >> >>>> I have a question about how to sync volume between two bricks after one >> >>>> node >> >>>> is reboot. >> >>>> >> >>>> There are two node, A node and B node.A node ip is 128.124.10.1 and B >> >>>> node ip >> >>>> is 128.124.10.2. >> >>>> >> >>>> operation steps on A node as below >> >>>> 1. gluster peer probe 128.124.10.2 >> >>>> 2. mkdir -p /data/brick/gv0 >> >>>> 3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0 >> >>>> 128.124.10.2 :/data/brick/gv1 force >> >>>> 4. gluster volume start gv0 >> >>>> 5.mount -t glusterfs 128.124.10.1 :/gv0 gluster >> >>>> >> >>>> operation steps on B node as below >> >>>> 1 . mkdir -p /data/brick/gv0 >> >>>> 2.mount -t glusterfs 128.124.10.1 :/gv0 gluster >> >>>> >> >>>> After all steps above , there a some gluster service process, including >> >>>> glusterd, glusterfs and glusterfsd, running on both A and B node. >> >>>> I can see these servic by command ps aux | grep gluster and command >> >>>> gluster >> >>>> volume status. >> >>>> >> >>>> Now reboot the B node.After B reboot , there are no gluster service >> >>>> running >> >>>> on B node. >> >>>> After I systemctl start glusterd , there is just glusterd service but >> >>>> not >> >>>> glusterfs and glusterfsd on B node. >> >>>> Because glusterfs and glusterfsd are not running so I can't gluster >> >>>> volume >> >>>> heal gv0 full. >> >>>> >> >>>> I want to know why glusterd don't start glusterfs and glusterfsd. >> >>> >> >>>On starting glusterd, glusterfsd should have started by itself. >> >>>Could you share glusterd and brick log (on node B) so that we know why >> >>>glusterfsd >> >>>didn't start? >> >>> >> >>>Do you still see glusterfsd service running on node A? You can try running >> >>>"gluster v start <VOLNAME> force" >> >>>on one of the nodes and check if all the brick processes started. >> >>> >> >>>gluster volume status <VOLNAME> should be able to provide you with gluster >> >>>process status. >> >>> >> >>>On restarting the node, glusterfs process for mount won't start by itself. >> >>>You will have to run >> >>>step 2 on node B again for it. >> >>> >> >>>> How do I restart these services on B node? >> >>>> How do I sync the replicate volume after one node reboot? >> >>> >> >>>Once the glusterfsd process starts on node B too, glustershd -- >> >>>self-heal-daemon -- for replicate volume >> >>>should start healing/syncing files that need to be synced. This deamon >> >>>does periodic syncing of files. >> >>> >> >>>If you want to trigger a heal explicitly, you can run gluster volume heal >> >>><VOLNAME> on one of the servers. >> >>>> >> >>>> Thanks, >> >>>> Xin >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> _______________________________________________ >> >>>> Gluster-users mailing list >> >>>> Gluster-users@xxxxxxxxxxx >> >>>> http://www.gluster.org/mailman/listinfo/gluster-users >> >>> >> >>>-- >> >>>Thanks, >> >>>Anuradha. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> Gluster-users mailing list >> >> Gluster-users@xxxxxxxxxxx >> >> http://www.gluster.org/mailman/listinfo/gluster-users >> >> >> > >-- >Thanks, >Anuradha.


 

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux