On 11/25/2014 12:59 PM, Raghavendra Gowdappa wrote:
----- Original Message -----
From: "Xavier Hernandez" <xhernandez@xxxxxxxxxx>
To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>, "Emmanuel Dreyfus" <manu@xxxxxxxxxx>
Sent: Tuesday, November 25, 2014 2:05:25 PM
Subject: Re: Wrong behavior on fsync of md-cache ?
On 11/25/2014 07:38 AM, Raghavendra Gowdappa wrote:
----- Original Message -----
From: "Xavier Hernandez" <xhernandez@xxxxxxxxxx>
To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>, "Emmanuel Dreyfus"
<manu@xxxxxxxxxx>
Sent: Tuesday, November 25, 2014 12:49:03 AM
Subject: Re: Wrong behavior on fsync of md-cache ?
I think the problem is here: the first thing wb_fsync()
checks is if there's an error in the fd (wd_fd_err()). If that's the
case, the call is immediately unwinded with that error. The error seems
to be set in wb_fulfill_cbk(). I don't know the internals of write-back
xlator, but this seems to be the problem.
Yes, your analysis is correct. Once the error is hit, fsync is not
queued behind unfulfilled writes. Whether it can be considered as a bug
is debatable. Since there is already an error in one of the writes which
was written-behind fsync should return the error. I am not sure whether
it should wait till we try to flush _all_ the writes that were written
behind. Any suggestions on what is the expected behaviour here?
I think that it should wait for all pending writes. In the test case I
used, all pending writes will fail the same way that the first one, but
in other situations it's possible to have a write failing (for example
due to a damaged block in disk) and following writes succeeding.
From the man page of fsync:
fsync() transfers ("flushes") all modified in-core data of (i.e.,
modified buffer cache pages for) the file referred to by the file
descriptor fd to the disk device (or other permanent storage
device) so that all changed information can be retrieved even after
the system crashed or was rebooted. This includes writing through
or flushing a disk cache if present. The call blocks until the
device reports that the transfer has completed. It also flushes
metadata information associated with the file (see stat(2)).
As I understand it, when fsync is received all queued writes must be
sent to the device (regardless if a previous write has failed or not).
It also says that the call blocks until the device has finished all the
operations.
However it's not clear to me how to control file consistency because
this allows some writes to succeed after a failed one.
Though fsync doesn't wait on queued writes after a failure, the queued writes are flushed to disk even in the existing codebase. Can you file a bug to make fsync to wait for completion of queued writes irrespective of whether flushing any of them failed or not? I'll send a patch to fix the issue.
I filed bug #1167793
Just to prioritise this, how important is the fix?
It seems to fail only in NetBSD. I'm not sure what priority it has.
Emmanuel is trying to create a regression test for new patches that
checks all tests in tests/basic, and tests/basic/ec/quota.t hits this issue.
An alternative would be to temporarily remove or change this test to
avoid the problem.
Xavi
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel