On Thu, Apr 12, 2018 at 11:08:50AM -0400, Jeff Layton wrote: > On Thu, 2018-04-12 at 22:01 +1000, Dave Chinner wrote: > > On Thu, Apr 12, 2018 at 07:09:14AM -0400, Jeff Layton wrote: > > > When there is a writeback error, what should be done with the dirty > > > page(s)? Right now, we usually just mark them clean and carry on. Is > > > that the right thing to do? > > > > There isn't a right thing. Whatever we do will be wrong for someone. > > > > > One possibility would be to invalidate the range that failed to be > > > written (or the whole file) and force the pages to be faulted in again > > > on the next access. It could be surprising for some applications to not > > > see the results of their writes on a subsequent read after such an > > > event. > > > > Not to mention a POSIX IO ordering violation. Seeing stale data > > after a "successful" write is simply not allowed. > > I'm not so sure here, given that we're dealing with an error condition. > Are we really obligated not to allow any changes to pages that we can't > write back? Posix says this about write(): After a write() to a regular file has successfully returned: Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified. IOWs, even if there is a later error, we told the user the write was successful, and so according to POSIX we are not allowed to wind back the data to what it was before the write() occurred. > Given that the pages are clean after these failures, we aren't doing > this even today: > > Suppose we're unable to do writes but can do reads vs. the backing > store. After a wb failure, the page has the dirty bit cleared. If it > gets kicked out of the cache before the read occurs, it'll have to be > faulted back in. Poof -- your write just disappeared. Yes - I was pointing out what the specification we supposedly conform to says about this behaviour, not that our current behaviour conforms to the spec. Indeed, have you even noticed xfs_aops_discard_page() and it's surrounding context on page writeback submission errors? To save you looking, XFS will trash the page contents completely on a filesystem level ->writepage error. It doesn't mark them "clean", doesn't attempt to redirty and rewrite them - it clears the uptodate state and may invalidate it completely. IOWs, the data written "sucessfully" to the cached page is now gone. It will be re-read from disk on the next read() call, in direct violation of the above POSIX requirements. This is my point: we've done that in XFS knowing that we violate POSIX specifications in this specific corner case - it's the lesser of many evils we have to chose between. Hence if we chose to encode that behaviour as the general writeback IO error handling algorithm, then it needs to done with the knowledge it is a specification violation. Not to mention be documented as a POSIX violation in the various relevant man pages and that this is how all filesystems will behave on async writeback error..... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx