Re: Machine becomes its own peer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Joe - Scott had sent me a private email and I provided the work around, for some (unknown) reason all the nodes ended up having two uuids for a particular peer which caused it. I've asked for the log files to further debug.

On Fri, 17 Feb 2017 at 21:58, Joe Julian <joe@xxxxxxxxxxxxxxxx> wrote:
Does your repaired server have the correct uuid /var/lib/glusterd/glusterd.info?

On February 16, 2017 9:49:56 PM PST, Scott Hazelhurst <Scott.Hazelhurst@xxxxxxxxxx> wrote:

Dear all

Last week I posted a query about a problem I had with a machine that had failed but the underlying hard disk with the gluster brick was good. I’ve made some progress in restoring. I now have the problem with my new restored machine where it becomes its own peer, which then breaks everything.

1. Gluster daemons are off on all peers, content of /var/lib/glusterd/peers looks good.
2. I start the gluster daemons on all peers. All looks good.
3. For about 2 minutes, there’s no obvious problem — if I do a gluster peer status on any machine it looks good, if I do a gluster volume status A01 on any machine it looks good.
4. Then at some point, the /var/lib/glusterd/peers file of the new, restored machine gets an entry for itself and things start breaking. A typical error message is the understandable

: Unable to get lock for uuid: 4fb930f7-554e-462a-9204-4592591feeb8, lock held by: 4fb930f7-554e-462a-9204-4592591feeb8

5. This is repeatable — if I stop daemons, remove the offending entry in /var/lib/glusterd/peer, and restart, the same behavior occurs — all good for a minute or two and then something magically puts something in /var/lib/glusterd/peers

In a previous step in restoring my machine, I had a different error of mismatching cksums and what I did then may be the cause of the problem. In searching the list archives I found someone with a similar cksum problem, and the proposed solution was to copy the /var/lib/glusterd/vols/ from another of the peers to the new machine. This may not be the issue but this is the only thing I think I did that was unconventional.

I am running version 3.7.5-19 on Scientific Linux 6.8

If anyone can suggest a way forward I would be grateful

Many thanks

Scott


<table width="100%" border="0" cellspacing="0" cellpadding="0" style="width:100%;">
<tr>
<td align="left" style="text-align:justify;"><font face="arial,sans-serif" size="1" color="#999999"><span style="font-size:11px;">This communication is intended for the addressee only. It is confidential. If you have received this communication in error, please notify us immediately and destroy the original message. You may not copy or disseminate this communication without the permission of the University. Only authorised signatories are competent to enter into agreements on behalf of the University and recipients are thus advised that the content of this message may not be legally binding on the University and may contain the personal views and opinions of the author, which are not necessarily the views and opinions of The University of the Witwatersrand, Johannesburg. All agreements between the University and outsiders are subject to South African Law unless the University agrees in writing to the contrary. </span></font></td>
</tr>
</table



--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
--
- Atin (atinm)
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux