On 04/04/2011 03:02 PM, Burnash, James wrote: > Sadly, this did not fix things. <sigh> > > My brick xattrs now look like this: > > http://pastebin.com/2p4iaZq3 > > And here is the debug output from a client where I restarted the > gluster client while the diagnostics.client-log-level DEBUG was set > > http://pastebin.com/5pjwxwsj > > I'm at somewhat of a loss. Any help would be greatly appreciated. Now it looks like g04 on gfs17/gfs18 has no DHT xattrs at all, leaving a hole from d999998c to e6666657. From the log, the "background meta-data self-heal" messages are probably related to that, though the failure messages about non-blocking inodelks (line 713) and possible split brain (e.g. line 777) still seem a bit odd. There are also some messages about timeouts (e.g. line 851) that are probably unrelated but might be worth investigating. I can suggest a few possible courses of action: (1) Do a "getfattr -n trusted.distribute.fix.layout" on the root (from the client side) to force the layouts to be recalculated. This is the same hook that's used by the first part of the rebalance code, but only does this one part on one directory. OTOH, it's also the same thing the self-heal should have done, so I kind of expect it will fail (harmlessly) in the same way. (2) Manually set the xattr on gfs{17,18}:/.../g04 to the "correct" value, like so: setfattr -n trusted.glusterfs.dht -v \ 0x0000000100000000d999998ce6666657 g04 (3) Migrate the data off that volume to others, remove/nuke/rebuild it, then add it back in a pristine state and rebalance.