Re: Duplicate UUID entries in "gluster peer status" command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Mon, Nov 21, 2016 at 2:28 PM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:


On Mon, Nov 21, 2016 at 10:00 AM, ABHISHEK PALIWAL <abhishpaliwal@xxxxxxxxx> wrote:
Hi Atin,

System is the embedded system and these dates are before the system get in timer sync.

Yes, I have also seen these two files in peers directory on 002500 board and I want to know the reason why gluster creates the second file when there is old file is exist. Even when you see the content of the these file are same.

Is it possible for gluster if we fall in this situation then instead of manually doing the steps which you mentioned above gluster will take care of this?

We shouldn't have any unwanted data in /var/lib/glusterd at first place and that's a prerequisite of gluster installation failing which inconsistencies of configuration data can't be handled automatically until manual intervention.
 
it means before starting of gluster installation /var/lib/glusterd always we empty? because in this case nothing is unwanted before installing the glusterd.

I have some questions:

1. based on the logs can we find out the reason for having two peers files with same contents.

No we can't as the log file doesn't have any entry of 26ae19a6-b58f-446a-b079-411d4ee57450 which indicates that this entry is a stale one and was (is) there since long time and the log files are the latest.
 
I agreed this 26ae19a6-b58f-446a-b079-411d4ee57450 entry is not there but as we checked this file is newer in peer and 5be8603b-18d0-4333-8590-38f918a22857 is the older file.

Also, below are some more logs in etc-glusterfs-glusterd.log file from 002500 board file

The message "I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <10.32.0.48> (<5be8603b-18d0-4333-8590-38f918a22857>), in state <Peer in Cluster>, has disconnected from glusterd." repeated 3 times between [2016-11-17 22:01:23.542556] and [2016-11-17 22:01:36.993584]
The message "W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for c_glusterfs" repeated 3 times between [2016-11-17 22:01:23.542973] and [2016-11-17 22:01:36.993855]
[2016-11-17 22:01:48.860555] I [MSGID: 106487] [glusterd-handler.c:1411:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2016-11-17 22:01:49.137733] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30706
[2016-11-17 22:01:49.240986] I [MSGID: 106493] [glusterd-rpc-ops.c:694:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 5be8603b-18d0-4333-8590-38f918a22857
[2016-11-17 22:11:58.658884] E [rpc-clnt.c:201:call_bail] 0-management: bailing out frame type(glusterd mgmt) op(--(3)) xid = 0x15 sent = 2016-11-17 22:01:48.945424. timeout = 600 for 10.32.0.48:24007
[2016-11-17 22:11:58.658987] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on 10.32.0.48. Please check log file for details.
[2016-11-17 22:11:58.659243] I [socket.c:3382:socket_submit_reply] 0-socket.management: not connected (priv->connected = 255)
[2016-11-17 22:11:58.659265] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management)
[2016-11-17 22:11:58.659305] E [MSGID: 106430] [glusterd-utils.c:400:glusterd_submit_reply] 0-glusterd: Reply submission failed
[2016-11-17 22:13:58.674343] E [rpc-clnt.c:201:call_bail] 0-management: bailing out frame type(glusterd mgmt) op(--(3)) xid = 0x11 sent = 2016-11-17 22:03:50.268751. timeout = 600 for 10.32.0.48:24007
[2016-11-17 22:13:58.674414] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on 10.32.0.48. Please check log file for details.
[2016-11-17 22:13:58.674604] I [socket.c:3382:socket_submit_reply] 0-socket.management: not connected (priv->connected = 255)
[2016-11-17 22:13:58.674627] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management)
[2016-11-17 22:13:58.674667] E [MSGID: 106430] [glusterd-utils.c:400:glusterd_submit_reply] 0-glusterd: Reply submission failed
[2016-11-17 22:15:58.687737] E [rpc-clnt.c:201:call_bail] 0-management: bailing out frame type(glusterd mgmt) op(--(3)) xid = 0x17 sent = 2016-11-17 22:05:51.341614. timeout = 600 for 10.32.0.48:24007

is these logs causing duplicate UUID or duplicate UUID causing this?

2. is there any way to do it from gluster code.

Ditto as above.
 

Regards,
Abhishek

Regards,
Abhishek

On Mon, Nov 21, 2016 at 9:52 AM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:
atin@dhcp35-96:~/Downloads/gluster_users/abhishek_dup_uuid/duplicate_uuid/glusterd_2500/peers$ ls -lrt
total 8
-rw-------. 1 atin wheel 71 Jan  1  1970 5be8603b-18d0-4333-8590-38f918a22857
-rw-------. 1 atin wheel 71 Nov 18 03:31 26ae19a6-b58f-446a-b079-411d4ee57450

In board 2500 look at the date of the file 5be8603b-18d0-4333-8590-38f918a22857 (marked in bold). Not sure how did you end up having this file in such time stamp. I am guessing this could be because of the set up been not cleaned properly at the time of re-installation.

Here is the steps what I'd recommend for now:

1. rename 26ae19a6-b58f-446a-b079-411d4ee57450 to 5be8603b-18d0-4333-8590-38f918a22857, you should have only one entry in the peers folder in board 2500.
2. Bring down both glusterd instances
3. Bring back one by one

And then restart glusterd to see if the issue persists.



On Mon, Nov 21, 2016 at 9:34 AM, ABHISHEK PALIWAL <abhishpaliwal@xxxxxxxxx> wrote:
Hope you will see in the logs......

On Mon, Nov 21, 2016 at 9:17 AM, ABHISHEK PALIWAL <abhishpaliwal@xxxxxxxxx> wrote:
Hi Atin,

It is not getting wipe off we have changed the configuration path from /var/lib/glusterd to /system/glusterd.

So, they will remain as same as previous.

On Mon, Nov 21, 2016 at 9:15 AM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:
Abhishek,

rebooting the board does wipe of /var/lib/glusterd contents in your set up right (as per my earlier conversation with you) ? In that case, how are you ensuring that the same node gets back the older UUID? If you don't then this is bound to happen.

On Mon, Nov 21, 2016 at 9:11 AM, ABHISHEK PALIWAL <abhishpaliwal@xxxxxxxxx> wrote:
Hi Team,

Please lookinto this problem as this is very widely seen problem in our system.

We are having the setup of replicate volume setup with two brick but after restarting the second board I am getting the duplicate entry in "gluster peer status" command like below:

# gluster peer status
Number of Peers: 2
 
Hostname: 10.32.0.48
Uuid: 5be8603b-18d0-4333-8590-38f918a22857
State: Peer in Cluster (Connected)
 
Hostname: 10.32.0.48
Uuid: 5be8603b-18d0-4333-8590-38f918a22857
State: Peer in Cluster (Connected)
#


I am attaching all logs from both the boards and the command outputs as well.

So could you please check what is the reason to get in this situation as it is very frequent in multiple case.

Also, we are not replacing any board from setup just rebooting.

--

Regards
Abhishek Paliwal

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users



--

~ Atin (atinm)



--




Regards
Abhishek Paliwal



--




Regards
Abhishek Paliwal



--

~ Atin (atinm)



--




Regards
Abhishek Paliwal



--

~ Atin (atinm)



--




Regards
Abhishek Paliwal
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux