Hello, I recently upgraded my infrastructure from a 1.3.12 server-based AFR cluster to a 1.4rc3 client-based AFR cluster. Among other things, i have noticed one very obvious change in the behaviour of self-healing between the two setups... The scenario is basic : one of the server nodes becomes inaccessible, and as a result, changes to a given file are not replicated. When the downed node returns, and the file is accessed, the self-heal feature is triggered, thus ensuring the integrity of the data across all server nodes. So far so good ; however, between the previous setup and that of the current, ? something ? has resulted in differing behaviour vis-?-vis the availability of said file. In the previous 1.3 server-based AFR setup, if a client attempted to write to the file, it was able to do so, with the change being replicated to the newly-returned node as part of the self-heal process. Perfect. However, in the current 1.4 client-based AFR setup, if a client attempts to write to the file, instead of Gluster accepting the write and propagating the change during the self-heal, the file becomes momentarily inaccessible. The self-heal process is then triggered, and the file - without the current attempted write - is replicated. Subsequent accesses are successful (and replicate as expected), but that ? triggering write ? still fails the first time. Furthermore, the log entry related to this particular process is confusing (log excerpt below). It follows the form : 1. Self-heal triggered 2. Unable to resolve conflicting data 3. Self-heal completed 4. File not found The reported conflict does not, in fact, appear to affect the self-heal, in that the file is replicated as expected. Is the error itself erroneous, or is there actually a problem ? Furthermore, even though the file clearly exists, and has in fact just been replicated, Gluster reports then throws an error on OPEN. This can't possibly be the expected behaviour. What within the underlying infrastructure has changed ? How can it be fixed ? Some log snippets : Tomcat (on client) --------- [Thread-25]09:32:09,398 ERROR: Error in copyfile. java.io.FileNotFoundException: /glusterfs/some/directory/somefile.txt (Input/output error) --------- glusterfs.log (on client) --------- 2008-12-17 09:32:09 W [afr-self-heal-common.c:1005:afr_self_heal] nasdash-afr: performing self heal on /glusterfs/some/directory/somefile.txt (metadata=0 data=1 entry=0) 2008-12-17 09:32:09 E [afr-self-heal-data.c:777:afr_sh_data_fix] nasdash-afr: Unable to resolve conflicting data of /glusterfs/some/directory/somefile.txt. Please resolve manually by deleting the file /glusterfs/some/directory/somefile.txt from all but the preferred subvolume 2008-12-17 09:32:09 W [afr-self-heal-data.c:70:afr_sh_data_done] nasdash-afr: self heal of /glusterfs/some/directory/somefile.txt completed 2008-12-17 09:32:09 E [fuse-bridge.c:662:fuse_fd_cbk] glusterfs-fuse: 189804: OPEN() /glusterfs/some/directory/somefile.txt => -1 (Input/output error) --------- Comments ? -- Daniel Maher <dma+gluster AT witbe DOT net>