----- Original Message ----- > From: "Xavier Hernandez" <xhernandez@xxxxxxxxxx> > To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>, "Emmanuel Dreyfus" <manu@xxxxxxxxxx> > Sent: Tuesday, November 25, 2014 2:05:25 PM > Subject: Re: Wrong behavior on fsync of md-cache ? > > On 11/25/2014 07:38 AM, Raghavendra Gowdappa wrote: > > ----- Original Message ----- > >> From: "Xavier Hernandez" <xhernandez@xxxxxxxxxx> > >> To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> > >> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>, "Emmanuel Dreyfus" > >> <manu@xxxxxxxxxx> > >> Sent: Tuesday, November 25, 2014 12:49:03 AM > >> Subject: Re: Wrong behavior on fsync of md-cache ? > >> > >> I think the problem is here: the first thing wb_fsync() > >> checks is if there's an error in the fd (wd_fd_err()). If that's the > >> case, the call is immediately unwinded with that error. The error seems > >> to be set in wb_fulfill_cbk(). I don't know the internals of write-back > >> xlator, but this seems to be the problem. > > > > Yes, your analysis is correct. Once the error is hit, fsync is not > > queued behind unfulfilled writes. Whether it can be considered as a bug > > is debatable. Since there is already an error in one of the writes which > > was written-behind fsync should return the error. I am not sure whether > > it should wait till we try to flush _all_ the writes that were written > > behind. Any suggestions on what is the expected behaviour here? > > > > I think that it should wait for all pending writes. In the test case I > used, all pending writes will fail the same way that the first one, but > in other situations it's possible to have a write failing (for example > due to a damaged block in disk) and following writes succeeding. > > From the man page of fsync: > > fsync() transfers ("flushes") all modified in-core data of (i.e., > modified buffer cache pages for) the file referred to by the file > descriptor fd to the disk device (or other permanent storage > device) so that all changed information can be retrieved even after > the system crashed or was rebooted. This includes writing through > or flushing a disk cache if present. The call blocks until the > device reports that the transfer has completed. It also flushes > metadata information associated with the file (see stat(2)). > > As I understand it, when fsync is received all queued writes must be > sent to the device (regardless if a previous write has failed or not). > It also says that the call blocks until the device has finished all the > operations. > > However it's not clear to me how to control file consistency because > this allows some writes to succeed after a failed one. Though fsync doesn't wait on queued writes after a failure, the queued writes are flushed to disk even in the existing codebase. Can you file a bug to make fsync to wait for completion of queued writes irrespective of whether flushing any of them failed or not? I'll send a patch to fix the issue. Just to prioritise this, how important is the fix? > I assume that > controlling this is the responsibility of the calling application that > should issue fsyncs on critical points to guarantee consistency. > > Anyway it seems that there's a difference between linux and NetBSD > because this test only fails on NetBSD. Is it possible that linux's fuse > implementation delays the fsync request until all pending writes have > been answered ? this would explain why this problem has not manifested > till now. NetBSD seems to send fsync (probably as the first step of a > close() call) when the first write fails. > > Xavi > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-devel