self heal errors on 3.1.1 clients

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Well, I must say THAT is good to hear. That being the case, I'm not touching anything that seems to be working.

Thanks Avati.

-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Anand Avati
Sent: Thursday, January 27, 2011 5:45 PM
To: David Lloyd
Cc: gluster-users at gluster.org
Subject: Re: self heal errors on 3.1.1 clients

David,
  The problem what you are facing is something we are already investigating.
We still haven't root-caused it yet, but from what we have seen this happens only on / and only for metadata changelog. This shows up as just annoying logs but it should not affect your functionality.

Avati

On Thu, Jan 27, 2011 at 2:03 PM, David Lloyd < david.lloyd at v-consultants.co.uk> wrote:

> Yes, it seemed really dangerous to me too. But with the lack of 
> documentation, and lack of response from gluster (and the data is 
> still on the old system too), I thought I'd give it a shot.
>
> Thanks for the explanation. The split-brain problem seems to come up 
> fairly regularly, but I've not found any clear explanation of what to 
> do in this situation. I'm starting to worry about what appears to be a 
> rationing of information from gluster.com to the the community at large.
>
> We're not in a position to purchase support, and I'm a sysadmin, not a 
> developer. I hope to make a contribution in terms of testing and 
> feedback and bug reports, but I'm seeing a lot of threads that seem to 
> go nowhere, and it's getting a bit frustrating.
>
> David
>
>
>
> > This seems really dangerous to me.  On a brick xxx, the 
> > trusted.afr.yyy attribute consists of three unsigned 32-bit 
> > counters, indicating how many uncommitted operations (data, 
> > metadata, and namespace respectively) might exist at yyy.  If xxx 
> > shows uncommitted operations at yyy but not vice versa, then we know 
> > that xxx is more up to date and it should be the
> source
> > for self-heal.  If two bricks show uncommitted operations at each 
> > other, then we're in the infamous "split brain" scenario.  Some 
> > client was
> unable
> > to clear the counter at xxx while another was unable to clear it at 
> > yyy,
> or
> > both xxx and yyy went down after the operation was complete but 
> > before
> they
> > could clear the counters for each other.
> >
> > In this case, it looks like a metadata operation (permission change) 
> > was
> in
> > this state.  If the permissions are in fact the same both places 
> > then it doesn't matter which way self-heal happens, or whether it happens at all.
> >  In fact, it seems to me that AFR should be able to detect this
> particular
> > condition and not flag it as an error.  In any case, I think you're
> probably
> > fine in this case but in general it's a very bad idea to clear these
> flags
> > manually because it can cause updates to be lost (if self-heal goes 
> > the wrong way) or files to remain in an inconsistent state (if no 
> > self-heal occurs).
> >
> > The real thing I'd wonder about is why both servers are so 
> > frequently becoming unavailable at the same instant (switch 
> > problem?) and why permission changes on the root are apparently so 
> > frequent that this ofen results in a split-brain.
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> >
>
>
>
> --
> David Lloyd
> V Consultants
> www.v-consultants.co.uk
> tel: +44 7983 816501
> skype: davidlloyd1243
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>


DISCLAIMER: 
This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. 
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux