Re: Wrong behavior on fsync of md-cache ?

Xavier Hernandez <xhernandez@xxxxxxxxxx> · Tue, 25 Nov 2014 09:35:25 +0100

On 11/25/2014 07:38 AM, Raghavendra Gowdappa wrote:
----- Original Message -----
From: "Xavier Hernandez" <xhernandez@xxxxxxxxxx>
To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>, "Emmanuel Dreyfus" <manu@xxxxxxxxxx>
Sent: Tuesday, November 25, 2014 12:49:03 AM
Subject: Re: Wrong behavior on fsync of md-cache ?

I think the problem is here: the first thing wb_fsync()
checks is if there's an error in the fd (wd_fd_err()). If that's the
case, the call is immediately unwinded with that error. The error seems
to be set in wb_fulfill_cbk(). I don't know the internals of write-back
xlator, but this seems to be the problem.

Yes, your analysis is correct. Once the error is hit, fsync is not
queued  behind unfulfilled writes. Whether it can be considered as a bug
is debatable.  Since there is already an error in one of the writes which
was written-behind  fsync should return the error. I am not sure whether
it should wait till we try to flush _all_ the writes that were written
behind. Any suggestions on what is the expected behaviour here?

I think that it should wait for all pending writes. In the test case I 
used, all pending writes will fail the same way that the first one, but 
in other situations it's possible to have a write failing (for example 
due to a damaged block in disk) and following writes succeeding.

From the man page of fsync:

    fsync() transfers ("flushes") all modified in-core data of (i.e.,
    modified buffer cache pages for) the file referred to by the file
    descriptor fd to the disk device (or other permanent storage
    device) so that all changed information can be retrieved even after
    the system crashed or was rebooted. This includes writing through
    or flushing a disk cache if present. The call blocks until the
    device reports that the transfer has completed. It also flushes
    metadata information associated with the file (see stat(2)).

As I understand it, when fsync is received all queued writes must be 
sent to the device (regardless if a previous write has failed or not). 
It also says that the call blocks until the device has finished all the 
operations.

However it's not clear to me how to control file consistency because 
this allows some writes to succeed after a failed one. I assume that 
controlling this is the responsibility of the calling application that 
should issue fsyncs on critical points to guarantee consistency.

Anyway it seems that there's a difference between linux and NetBSD 
because this test only fails on NetBSD. Is it possible that linux's fuse 
implementation delays the fsync request until all pending writes have 
been answered ? this would explain why this problem has not manifested 
till now. NetBSD seems to send fsync (probably as the first step of a 
close() call) when the first write fails.

Xavi
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel