Well, I must say THAT is good to hear. That being the case, I'm not touching anything that seems to be working. Thanks Avati. -----Original Message----- From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Anand Avati Sent: Thursday, January 27, 2011 5:45 PM To: David Lloyd Cc: gluster-users at gluster.org Subject: Re: self heal errors on 3.1.1 clients David, The problem what you are facing is something we are already investigating. We still haven't root-caused it yet, but from what we have seen this happens only on / and only for metadata changelog. This shows up as just annoying logs but it should not affect your functionality. Avati On Thu, Jan 27, 2011 at 2:03 PM, David Lloyd < david.lloyd at v-consultants.co.uk> wrote: > Yes, it seemed really dangerous to me too. But with the lack of > documentation, and lack of response from gluster (and the data is > still on the old system too), I thought I'd give it a shot. > > Thanks for the explanation. The split-brain problem seems to come up > fairly regularly, but I've not found any clear explanation of what to > do in this situation. I'm starting to worry about what appears to be a > rationing of information from gluster.com to the the community at large. > > We're not in a position to purchase support, and I'm a sysadmin, not a > developer. I hope to make a contribution in terms of testing and > feedback and bug reports, but I'm seeing a lot of threads that seem to > go nowhere, and it's getting a bit frustrating. > > David > > > > > This seems really dangerous to me. On a brick xxx, the > > trusted.afr.yyy attribute consists of three unsigned 32-bit > > counters, indicating how many uncommitted operations (data, > > metadata, and namespace respectively) might exist at yyy. If xxx > > shows uncommitted operations at yyy but not vice versa, then we know > > that xxx is more up to date and it should be the > source > > for self-heal. If two bricks show uncommitted operations at each > > other, then we're in the infamous "split brain" scenario. Some > > client was > unable > > to clear the counter at xxx while another was unable to clear it at > > yyy, > or > > both xxx and yyy went down after the operation was complete but > > before > they > > could clear the counters for each other. > > > > In this case, it looks like a metadata operation (permission change) > > was > in > > this state. If the permissions are in fact the same both places > > then it doesn't matter which way self-heal happens, or whether it happens at all. > > In fact, it seems to me that AFR should be able to detect this > particular > > condition and not flag it as an error. In any case, I think you're > probably > > fine in this case but in general it's a very bad idea to clear these > flags > > manually because it can cause updates to be lost (if self-heal goes > > the wrong way) or files to remain in an inconsistent state (if no > > self-heal occurs). > > > > The real thing I'd wonder about is why both servers are so > > frequently becoming unavailable at the same instant (switch > > problem?) and why permission changes on the root are apparently so > > frequent that this ofen results in a split-brain. > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > > > > > > -- > David Lloyd > V Consultants > www.v-consultants.co.uk > tel: +44 7983 816501 > skype: davidlloyd1243 > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > > DISCLAIMER: This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com