Re: Handling Failed flushes in write-behind

Prashanth Pai <ppai@xxxxxxxxxx> · Wed, 30 Sep 2015 02:08:38 -0400 (EDT)

> > > As for as 2, goes, application can checkpoint by doing fsync and on write
> > > failures, roll-back to last checkpoint and replay writes from that
> > > checkpoint. Or, glusterfs can retry the writes on behalf of the
> > > application. However, glusterfs retrying writes cannot be a complete
> > > solution as the error-condition we've run into might never get resolved
> > > (For eg., running out of space). So, glusterfs has to give up after some
> > > time.

The application should not be expected to replay writes. glusterfs must be retrying the failed write.
In gluster-swift, we had hit into a case where the application would get EIO but the write had actually failed because of ENOSPC.
https://bugzilla.redhat.com/show_bug.cgi?id=986812

Regards,
 -Prashanth Pai

----- Original Message -----
> From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
> To: "Vijay Bellur" <vbellur@xxxxxxxxxx>
> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>, "Ben Turner" <bturner@xxxxxxxxxx>, "Ira Cooper" <icooper@xxxxxxxxxx>
> Sent: Tuesday, September 29, 2015 4:56:33 PM
> Subject: Re:  Handling Failed flushes in write-behind
> 
> + gluster-devel
> 
> > 
> > On Tuesday 29 September 2015 04:45 PM, Raghavendra Gowdappa wrote:
> > > Hi All,
> > >
> > > Currently on failure of flushing of writeback cache, we mark the fd bad.
> > > The rationale behind this is that since the application doesn't know
> > > which
> > > of the writes that are cached failed, fd is in a bad state and cannot
> > > possibly do a meaningful/correct read. However, this approach (though
> > > posix-complaint) is not acceptable for long standing applications like
> > > QEMU [1]. So, a two part solution was decided:
> > >
> > > 1. No longer mark the fd bad during failures while flushing data to
> > > backend
> > > from write-behind cache.
> > > 2. retry the writes
> > >
> > > As for as 2, goes, application can checkpoint by doing fsync and on write
> > > failures, roll-back to last checkpoint and replay writes from that
> > > checkpoint. Or, glusterfs can retry the writes on behalf of the
> > > application. However, glusterfs retrying writes cannot be a complete
> > > solution as the error-condition we've run into might never get resolved
> > > (For eg., running out of space). So, glusterfs has to give up after some
> > > time.
> > >
> > > It would be helpful if you give your inputs on how other writeback
> > > systems
> > > (Eg., kernel page-cache, nfs, samba, ceph, lustre etc) behave in this
> > > scenario and what would be a sane policy for glusterfs.
> > >
> > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1200862
> > >
> > > regards,
> > > Raghavendra
> > >
> > 
> > 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel