Thanks so much Jeff. I'll do step 2 and then see when I can get to step 3, and I'll give you an update on the outcome so that this thread will be useful to others as well as us. James Burnash, Unix Engineering -----Original Message----- From: Jeff Darcy [mailto:jdarcy at redhat.com] Sent: Tuesday, April 05, 2011 4:38 PM To: Burnash, James Cc: 'amar at gluster.com'; 'gluster-users at gluster.org' Subject: Re: [SPAM?] Re: strange error hangs any access to gluster mount On 04/05/2011 03:08 PM, Burnash, James wrote: > Hi Jeff. > > Thanks again for the help - it's much appreciated. > > I tried suggestion (1), and as you suspected - no joy. > > For suggestions (2) and (3) - I haven't done (2) yet, because I don't > have any available empty bricks to migrate the data from g04. I have > plenty of space on existing bricks, but I don't think (from the docs) > that is the intention of this activity, and I'd hate to lose data, as > I'm already somewhat in the dog house. You mean you haven't done (3) yet, right? (2) shouldn't require any extra space. > Can you think of any work around to this problem? And what would be > the effect of just implementing step (2)? The effect should be to create a hash space that has no gaps or overlaps, which will keep DHT in its normal state instead of trying (and apparently failing) to self-heal. In the normal state, if DHT fails to find a file where it expects based on hashing, it will find it elsewhere and create a linkfile on a file-by-file basis. This might be slightly inefficient until a rebalance is done, but with the top-level xattrs manually repaired the rebalance might actually succeed. Alternatively, the setfattr might fail, in which case we have something much simpler to diagnose on those local filesystems. I can't say there's no risk of data loss with any of these approaches, I'm afraid, since we're in such a weird state and don't know how we got there. What I can say is that, from my knowledge of the code, setting that xattr manually shouldn't carry any greater risk of data loss than removing them did. The worst outcome that seems likely is that it doesn't help and we end up exactly where we are already. > -----Original Message----- From: Jeff Darcy > Now it looks like g04 on gfs17/gfs18 has no DHT xattrs at all, > leaving a hole from d999998c to e6666657. From the log, the > "background meta-data self-heal" messages are probably related to > that, though the failure messages about non-blocking inodelks (line > 713) and possible split brain (e.g. line 777) still seem a bit odd. > There are also some messages about timeouts (e.g. line 851) that are > probably unrelated but might be worth investigating. I can suggest a > few possible courses of action: > > (1) Do a "getfattr -n trusted.distribute.fix.layout" on the root > (from the client side) to force the layouts to be recalculated. This > is the same hook that's used by the first part of the rebalance code, > but only does this one part on one directory. OTOH, it's also the > same thing the self-heal should have done, so I kind of expect it > will fail (harmlessly) in the same way. > > (2) Manually set the xattr on gfs{17,18}:/.../g04 to the "correct" > value, like so: > > setfattr -n trusted.glusterfs.dht -v \ > 0x0000000100000000d999998ce6666657 g04 > > (3) Migrate the data off that volume to others, remove/nuke/rebuild > it, then add it back in a pristine state and rebalance. DISCLAIMER: This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com