On 08/15/2012 11:27 AM, Emmanuel Dreyfus wrote: > Attributes: > trusted.glusterfs.dht 00 00 00 01 00 00 00 00 7f ff ff ff ff ff ff ff > trusted.afr.gfs33-client-1 00 00 00 00 00 00 00 02 00 00 00 00 > trusted.afr.gfs33-client-0 00 00 00 00 00 00 00 00 00 00 00 00 > trusted.gfid 29 d1 70 bb 63 91 40 ed b4 c6 27 d8 ca a7 2a 64 > > On the other bricks: > trusted.glusterfs.dht 00 00 00 01 00 00 00 00 00 00 00 00 7f ff ff fe > trusted.afr.gfs33-client-2 00 00 00 00 00 00 00 00 00 00 00 00 > trusted.afr.gfs33-client-3 00 00 00 00 00 00 00 00 00 00 00 00 > trusted.gfid 29 d1 70 bb 63 91 40 ed b4 c6 27 d8 ca a7 2a 64 > > trusted.glusterfs.dht 00 00 00 01 00 00 00 00 7f ff ff ff ff ff ff ff > trusted.afr.gfs33-client-1 00 00 00 00 00 00 00 00 00 00 00 00 > trusted.afr.gfs33-client-3 00 00 00 00 00 00 00 00 00 00 00 00 > trusted.gfid 29 d1 70 bb 63 91 40 ed b4 c6 27 d8 ca a7 2a 64 > > trusted.glusterfs.dht 00 00 00 01 00 00 00 00 00 00 00 00 7f ff ff fe > trusted.afr.gfs33-client-2 00 00 00 00 00 00 00 01 00 00 00 00 > trusted.afr.gfs33-client-3 00 00 00 00 00 00 00 00 00 00 00 00 > trusted.gfid 29 d1 70 bb 63 91 40 ed b4 c6 27 d8 ca a7 2a 64 > > I tried to understand the code here, It is reading trusted.afr.gfs33-client-* > and it builds a matrix, which looks like this: > pending_matrix: [ 0 1 ] > pending_matrix: [ 2 0 ] > > Then afr_sh_wise_nodes_conflict() decides that nsources = -1. > > Is there some documentation explaining how it works? Someone call tell me why > it decides it is split brain? I really hope the above contains a typo or copy/paste error, because if it doesn't then ICK. Without seeing the volfile I have to guess a little, but it looks as though the first and third bricks above should be client-0 and client-1 (check the matching values of trusted.glusterfs.dht) while the second and fourth should be client-2 and client-3. In the first place, it's odd that the file even exists in both replica sets. Is one a linkfile? In any case, I think the second and fourth bricks shown above (client-2 and client-3) are irrelevant. The next anomaly is the 2 in the pending matrix. Its position indicates that it's the second volume in the AFR definition accusing the first, and the first must be client-1 based on the xattr name, so your volume definition must be backwards - "subvolumes client-1 client-0" in the volfile. That's how we get to [0 0][2 0]. Where does the counter-accusation come from? One clue might be that client-1 (the third brick shown above) has xattrs for itself and *client-3*. Because it's missing an xattr for client-0, it's considered ignorant and therefore we bump up other bricks' pending-operation counts for it. However, because of the reversed brick order that should be client-0 (second row) accusing client-1 (first column) getting us to [0 0][3 0] and that's fully resolvable. In fact I tried this xattr configuration, in both directions, on a simple two-brick AFR volume myself, and it healed correctly both times. The only thing I can think of is that there's some further confusion or inconsistency in how your volumes are defined, so that either the handling of ignorant nodes is being done the wrong way or the pending-operation count from the fourth brick shown above is being brought in even though it should be irrelevant. If I were you I'd double check that the volfiles look the same everywhere, that the same brick names refer to the same physical locations everywhere (includes checking /etc/hosts or DNS for inconsistencies), and that the xattr values really are as reported above. I don't think this combination of conditions can occur without there being some kind of inconsistency there.