Hi Kevan, This particular case is failing with older kernel+fuse versions which send mknod+open for create call. I will fix the issue and let you know. Thanks Krishna On 10/4/07, Kevan Benson <kbenson@xxxxxxxxxxxxxxx> wrote: > Krishna Srinivas wrote: > > On 10/4/07, Kevan Benson <kbenson@xxxxxxxxxxxxxxx> wrote: > > > >> Is self heal supposed to work with partial files? I have an issue where > >> self-heal isn't happening on some servers with AFR and unify in a HA > >> setup I developed. Two servers, two clients, all AFR and unify done on > >> client side. > >> > >> If I kill a connection while a large file is being written, the > >> glusterfs mount waits the appropriate timeout period (10 seconds in my > >> case) and then finishes writing the file to the still active server. > >> This results in a full file on one server and a partial file on the > >> other (the one I stopped traffic to temporarily to simulate a > >> crash/network problem). If I then enable the disabled server and read > >> data from the problematic file, it doesn't self-heal itself and move > >> the full file to the server with the partial file. > >> > >> Anything written entirely while a server is offline (i.e. the offline > >> server has no knowledge of it) is correctly created on read from the > >> file, so the problem seems to be related to files that are partially > >> written to one server. > >> > >> Can someone comment on the particular conditions that cause a self > >> heal? Is there something I can do to force it to self heal at this > >> point (I repeat that reading data from the file does not work). I know > >> I can use rsync and some foo to fix this, but that becomes less and less > >> feasible as the mount size grows and the time for rsync to compare sides > >> lengthens. > >> > >> > >> _______________________________________________ > >> Gluster-devel mailing list > >> Gluster-devel@xxxxxxxxxx > >> http://lists.nongnu.org/mailman/listinfo/gluster-devel > >> > >> > > > > Hi Kevan, > > > > It should have worked fine in your case. What version of glusterfs are > > you using? Just before you do the second read (or open rather) which > > should have triggered self-heal can you do getfattr -n trusted.afr.version <> > > on the partial file and also the full file in the backend and give the output? > > > > Thanks > > Krishna > > > > > > Glusterfs TLA 504, fuse-2.7.0-gfs4. > > The trusted.afr.version attribute doesn't exist on the partial file, it > does exist on the complete file (with value "1"). From what I just > tested, it doesn't look like it's set until the file operation is > complete (it doesn't exist during writing). Are files without this > attribute assumed to have a value of "0" or something to ensure that > they participate in self-heal correctly? > > It doesn't look like it, as if I append data to the file, the partial > version gets assigned a trusted.afr.version=1, while the complete file's > trusted.afr.version is incremented to 2. Self heal now works for that > file, and on a read of file data the partial file is updated with all > data and the trusted.afr.version is set to 2. > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel >