strange error hangs hangs any access to gluster mount

amar at gluster.com (Amar Tumballi) · Thu, 31 Mar 2011 12:03:49 +0530

Hi James,

To fix this, you can go to *any one pair* backend and run below commands on
the directories where the layout has issues:

bash# setfattr -x trusted.glusterfs.dht <directory>

[ pair backend means, in replica set's volumes ]

and then from the client machine (ie, where you have mount point), run below
commands,

 bash# echo 3 > /proc/sys/vm/drop_caches
 bash# stat <directory> # through the mount point.

In this step, the layout will get fixed again automatically, which should
solve this issue.

Regards,
Amar

On Tue, Mar 29, 2011 at 12:45 AM, Burnash, James <jburnash at knight.com>wrote:

> Thanks Jeff. That at least gives me shot at figuring out some similar
> problems.
>
> It's possible that in the course of bringing up the mirrors initially I
> futzed something up. I'll have to check the read-write servers as well.
>
> James Burnash, Unix Engineering
>
> -----Original Message-----
> From: Jeff Darcy [mailto:jdarcy at redhat.com]
> Sent: Monday, March 28, 2011 3:09 PM
> To: Burnash, James
> Cc: gluster-users at gluster.org
> Subject: Re: strange error hangs hangs any access to
> gluster mount
>
> On 03/28/2011 02:29 PM, Burnash, James wrote:
> > Sorry - paste went awry.
> >
> > Updated here:
> >
> > http://pastebin.com/M74LAYej
>
> OK, that definitely shows a problem.  Here's the whole map of which
> nodes are claiming which ranges:
>
> 00000000 0ccccccb: g07 on gfs17/gfs18
> 0ccccccc 19999997: g08 on gfs17/gfs18
> 19999998 26666663: g09 on gfs17/gfs18
> 26666664 3333332f: g10 on gfs17/gfs18
> 33333330 3ffffffb: g01 on gfs17/gfs18
> 3ffffffc 4cccccc7: g02 on gfs17/gfs18
> 4cccccc8 59999993: g01 on gfs14/gfs14
> 59999994 6666665f: g02 on gfs14/gfs14
> 66666660 7333332b: g03 on gfs14/gfs14
> 7333332c 7ffffff7: g04 on gfs14/gfs14
> 7ffffff8 8cccccc3: g05 on gfs14/gfs14
> 8cccccc4 9999998f: g06 on gfs14/gfs14
> 99999990 a666665b: g07 on gfs14/gfs14
> a666665c b3333327: g08 on gfs14/gfs14
> b3333328 b333332e: g09 on gfs14/gfs14
> b333332f bffffff3: g09 on gfs14/gfs14
>                   *** AND g04 on gfs17/18
> bffffff4 ccccccbf: g10 on gfs14/gfs14
>                   *** AND g04 on gfs17/18
> ccccccc0 ccccccc7: g03 on gfs17/gfs18
>                   *** AND g04 on gfs17/18
> ccccccc8 d999998b: g03 on gfs17/gfs18
> d999998c e6666657: *** GAP ***
> e6666658 f3333323: g05 on gfs17/gfs18
> f3333324 ffffffff: g06 on gfs17/gfs18
>
> I know this all seems like numerology, but bear with me.  Note that all
> of the problems seem to involve g04 on gfs17/gfs18 claiming the wrong
> range, and that the range it's claiming is almost exactly twice the size
> of all the other ranges.  In fact, it's the range it would have been
> assigned if there had been ten nodes instead of twenty.  For example, if
> that filesystem had been restored to an earlier state on gfs17/gfs18,
> and then self-healed in the wrong direction (self-mangled?) you would
> get exactly this set of symptoms.  I'm not saying that's what happened;
> it's just a way to illustrate what these values mean and the
> consequences of their being out of sync with each other.
>
> So, why only one client?  Since you're reporting values on the servers,
> I'd guess it's because only that client has remounted.  The others are
> probably still operating from cached (and apparently correct) layout
> information.  This is a very precarious state, I'd have to say.  You
> *might* be able to fix this by fixing the xattr values on that one
> filesystem, but I really can't recommend trying that without some input
> from Gluster themselves.
>
>
> DISCLAIMER:
> This e-mail, and any attachments thereto, is intended only for use by the
> addressee(s) named herein and may contain legally privileged and/or
> confidential information. If you are not the intended recipient of this
> e-mail, you are hereby notified that any dissemination, distribution or
> copying of this e-mail, and any attachments thereto, is strictly prohibited.
> If you have received this in error, please immediately notify me and
> permanently delete the original and any copy of any e-mail and any printout
> thereof. E-mail transmission cannot be guaranteed to be secure or
> error-free. The sender therefore does not accept liability for any errors or
> omissions in the contents of this message which arise as a result of e-mail
> transmission.
> NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at
> its discretion, monitor and review the content of all e-mail communications.
> http://www.knight.com
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>