On Thu, Apr 12, 2012 at 3:49 PM, Jeff Darcy wrote: > > (1) To a first approximation, it should be safe to "merge" directory > contents > despite there being a split-brain problem, by healing any file that exists > on > only one brick from there to its peer(s). I am not sure if got this right, but if I did, this should be the two way scenario depicted at the end of the message. > (3) The reason you continue to get I/O errors is probably that the xattrs > on > the *parent directory* still indicate pending operations on both sides. > You > can verify this with the following command on each brick: > > getfattr -d -e hex -n trusted.glusterfs.dht /a > Unfortunately: getfattr: /a: Input/output error And when running on any working instance, it says trusted.glusterfs.dht: No such attribute. > If the result is non-zero (most likely in the last four-byte integer > indicating > a directory-entry operation) then that confirms our theory. It should be > safe > for the self-heal code to clear these counts if (and only if) the > directories > are checked and found identical. In fact, I think we already do this. > Thus, > manual copying of files followed by self-heal on the parent directory > should > make the errors go away. I encourage you to try that while I go look at > the code. > Ok, I thought of two ways to manually copy files and making gluster think the directories are identical. ----BTW, I found out that if I disrupt again connectivity between the nodes, I am able to do operations on the mountpoint (/a) ---- 1st way - node1 (10.0.2.14) scp /local/howareyou 10.0.2.15:/local scp 10.0.2.15:/local/hello /local ls /a ls: cannot access /a: Input/output error iptables -A INPUT -s 10.0.2.15 -j DROP - so I can access mountpoint ls -lh /a ????????????? ? ? ? ? ? hello -rw-r--r-- 1 root root 0 Apr 6 01:48 howareyou 2nd way - node1 (10.0.2.14) (from scratch) iptables -A INPUT -p tcp -s 10.0.2.15 -j DROP - so I can access mountpoint -allow ssh- scp 10.0.2.15:/a/hello /a scp /a/howareyou 10.0.2.14:/a - now they are in sync - iptables -F INPUT ls /a - works briefly but after a while: ls: cannot access /a: Input/output error As per documentation, triggering a self heal is done by find <gluster-mount> -noleaf -print0 | xargs --null stat (where <gluster-mount> is /a) - but again, /a cannot be accessed. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://gluster.org/pipermail/gluster-users/attachments/20120412/73c30621/attachment.htm>