Hi Kevan, It should have worked fine in your case. What version of glusterfs are you using? Just before you do the second read (or open rather) which should have triggered self-heal can you do getfattr -n trusted.afr.version <> on the partial file and also the full file in the backend and give the output? Thanks Krishna On 10/4/07, Kevan Benson <kbenson@xxxxxxxxxxxxxxx> wrote: > > Is self heal supposed to work with partial files? I have an issue where > self-heal isn't happening on some servers with AFR and unify in a HA > setup I developed. Two servers, two clients, all AFR and unify done on > client side. > > If I kill a connection while a large file is being written, the > glusterfs mount waits the appropriate timeout period (10 seconds in my > case) and then finishes writing the file to the still active server. > This results in a full file on one server and a partial file on the > other (the one I stopped traffic to temporarily to simulate a > crash/network problem). If I then enable the disabled server and read > data from the problematic file, it doesn't self-heal itself and move > the full file to the server with the partial file. > > Anything written entirely while a server is offline (i.e. the offline > server has no knowledge of it) is correctly created on read from the > file, so the problem seems to be related to files that are partially > written to one server. > > Can someone comment on the particular conditions that cause a self > heal? Is there something I can do to force it to self heal at this > point (I repeat that reading data from the file does not work). I know > I can use rsync and some foo to fix this, but that becomes less and less > feasible as the mount size grows and the time for rsync to compare sides > lengthens. > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel >