For ease of access, I am posting the summary from commit-msg below: 1. When sync fails, the cached-write is still preserved unless there is a flush/fsync waiting on it. 2. When a sync fails and there is a flush/fsync waiting on the cached-write, the cache is thrown away and no further retries will be made. In other words flush/fsync act as barriers for all the previous writes. All previous writes are either successfully synced to backend or forgotten in case of an error. Without such barrier fop (especially flush which is issued prior to a close), we end up retrying for ever even after fd is closed. 3. If a fop is waiting on cached-write and syncing to backend fails, the waiting fop is failed. 4. sync failures when no fop is waiting are ignored and are not propagated to application. 5. The effect of repeated sync failures is that, there will be no cache for future writes and they cannot be written behind. Above algo is for handling of transient errors (EDQUOT, ENOSPC, ENOTCONN). Handling of non-transient errors is slightly different as below: 1. Throw away the write-buffer, so that cache is freed. This means no retries are made for non-transient errors. Also, since cache is freed, future writes can be written-behind. 2. Retain the request till an fsync or flush. This means all future operations to failed regions will fail till an fsync/flush. This is a conservative error handling to force application to know that a written-behind write has failed and take remedial action like rollback to last fsync and retrying all the writes from that point. _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel