Re: Self-heal with partial files

"Krishna Srinivas" <krishna@xxxxxxxxxxxxx> · Sat, 6 Oct 2007 12:56:54 +0530

Hi Kevan,

This particular case is failing with older kernel+fuse versions which
send mknod+open for create call. I will fix the issue and let you know.

Thanks
Krishna

On 10/4/07, Kevan Benson <kbenson@xxxxxxxxxxxxxxx> wrote:
> Krishna Srinivas wrote:
> > On 10/4/07, Kevan Benson <kbenson@xxxxxxxxxxxxxxx> wrote:
> >
> >> Is self heal supposed to work with partial files?  I have an issue where
> >> self-heal isn't happening on some servers with AFR and unify in a HA
> >> setup I developed.  Two servers, two clients, all AFR and unify done on
> >> client side.
> >>
> >> If I kill a connection while a large file is being written, the
> >> glusterfs mount waits the appropriate timeout period (10 seconds in my
> >> case) and then finishes writing the file to the still active server.
> >> This results in a full file on one server and a partial file on the
> >> other (the one I stopped traffic to temporarily to simulate a
> >> crash/network problem).  If I then enable the disabled server and read
> >> data from the problematic file, it doesn't self-heal  itself and move
> >> the full file to the server with the partial file.
> >>
> >> Anything written entirely while a server is offline (i.e. the offline
> >> server has no knowledge of it) is correctly created on read from the
> >> file, so the problem seems to be related to files that are partially
> >> written to one server.
> >>
> >> Can someone comment on the particular conditions that cause a self
> >> heal?  Is there something I can do to force it to self heal at this
> >> point (I repeat that reading data from the file does not work).  I know
> >> I can use rsync and some foo to fix this, but that becomes less and less
> >> feasible as the mount size grows and the time for rsync to compare sides
> >> lengthens.
> >>
> >>
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> Gluster-devel@xxxxxxxxxx
> >> http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >>
> >>
> >
> > Hi Kevan,
> >
> > It should have worked fine in your case. What version of glusterfs are
> > you using? Just before you do the second read (or open rather) which
> > should have triggered self-heal can you do getfattr -n trusted.afr.version <>
> > on the partial file and also the full file  in the backend and give the output?
> >
> > Thanks
> > Krishna
> >
> >
>
> Glusterfs TLA 504, fuse-2.7.0-gfs4.
>
> The trusted.afr.version attribute doesn't exist on the partial file, it
> does exist on the complete file (with value "1").  From what I just
> tested, it doesn't look like it's set until the file operation is
> complete (it doesn't exist during writing).  Are files without this
> attribute assumed to have a value of "0" or something to ensure that
> they participate in self-heal correctly?
>
> It doesn't look like it, as if I append data to the file, the partial
> version gets assigned a trusted.afr.version=1, while the complete file's
> trusted.afr.version is incremented to 2.  Self heal now works for that
> file, and on a read of file data the partial file is updated with all
> data and the trusted.afr.version is set to 2.
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>