[SPAM?] Re: strange error hangs any access to gluster mount

jburnash at knight.com (Burnash, James) · Tue, 5 Apr 2011 15:08:01 -0400

Hi Jeff.

Thanks again for the help - it's much appreciated.

I tried suggestion (1), and as you suspected - no joy.

For suggestions (2) and (3) - I haven't done (2) yet, because I don't have any available empty bricks to migrate the data from g04. I have plenty of space on existing bricks, but I don't think (from the docs) that is the intention of this activity, and I'd hate to lose data, as I'm already somewhat in the dog house.

Can you think of any work around to this problem? And what would be the effect of just implementing step (2)?

Thanks,

James Burnash, Unix Engineering

-----Original Message-----
From: Jeff Darcy [mailto:jdarcy at redhat.com]
Sent: Tuesday, April 05, 2011 9:08 AM
To: Burnash, James
Cc: 'amar at gluster.com'; 'gluster-users at gluster.org'
Subject: Re: [SPAM?] Re: strange error hangs hangs any access to gluster mount

On 04/04/2011 03:02 PM, Burnash, James wrote:
> Sadly, this did not fix things. <sigh>
>
> My brick xattrs now look like this:
>
> http://pastebin.com/2p4iaZq3
>
> And here is the debug output from a client where I restarted the
> gluster client while the diagnostics.client-log-level DEBUG was set
>
> http://pastebin.com/5pjwxwsj
>
> I'm at somewhat of a loss. Any help would be greatly appreciated.

Now it looks like g04 on gfs17/gfs18 has no DHT xattrs at all, leaving a
hole from d999998c to e6666657. From the log, the "background meta-data
self-heal" messages are probably related to that, though the failure
messages about non-blocking inodelks (line 713) and possible split brain
(e.g. line 777) still seem a bit odd. There are also some messages about
timeouts (e.g. line 851) that are probably unrelated but might be worth
investigating. I can suggest a few possible courses of action:

(1) Do a "getfattr -n trusted.distribute.fix.layout" on the root (from
the client side) to force the layouts to be recalculated. This is the
same hook that's used by the first part of the rebalance code, but only
does this one part on one directory. OTOH, it's also the same thing the
self-heal should have done, so I kind of expect it will fail
(harmlessly) in the same way.

(2) Manually set the xattr on gfs{17,18}:/.../g04 to the "correct"
value, like so:

        setfattr -n trusted.glusterfs.dht -v \
        0x0000000100000000d999998ce6666657 g04

(3) Migrate the data off that volume to others, remove/nuke/rebuild it,
then add it back in a pristine state and rebalance.

DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com