On Mon, Sep 15, 2014 at 09:18:22AM -0400, Brian Foster wrote: > On Mon, Sep 15, 2014 at 11:46:54AM +1000, Dave Chinner wrote: > > xfs: ensure WB_SYNC_ALL writeback handles partial pages correctly > > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > XFS has been having trouble with stray delayed allocation extents > > beyond EOF for a long time. Recent changes to the collapse range > > code has triggered erroneous EBUSY errors on page invalidtion for > > block size smaller than page size filesystems. These > > have been caused by dirty buffers beyond EOF on a partial page which > > do not get written to disk during a sync. > > > > The issue is that write-ahead in xfs_cluster_write() finds such a > > partial page and handles it by leaving the page dirty but pushing it > > into a writeback state. This used to work just fine, as the > > write_cache_pages() code would then find the dirty partial page in > > the next mapping tree lookup as the dirty tag is still set. > > > > Unfortunately, when we moved to a mark and sweep approach to > > writeback to fix other writeback sync issues, we broken this. THe > > act of marking the page as under writeback now clears the TOWRITE > > tag in the radix tree, even though the page is still dirty. This > > causes the TOWRITE tag to be cleared, and hence the next lookup on > > the mapping tree does not find the dirty partial page and so doesn't > > try to write it again. > > > > This same writeback bug was found recently in ext4 and fixed in > > commit 1c8349a ("ext4: fix data integrity sync in ordered mode") > > without communication to the wider filesystem community. We can use > > exactly the same fix here so the TOWRITE flag is not cleared on > > partial page writes. > > > > cc: stable@xxxxxxxxxxxxxxx # dependent on 1c8349a17137b93f0a83f276c764a6df1b9a116e > > Root-cause-found-by: Brian Foster <bfoster@xxxxxxxxxx> > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > --- > > Looks good and fixes the collapse failure in my test. > > Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx> > > I suppose we should prepend the collapse rework series with this patch > to avoid the regression as it pertains to collapse (obviously the > failure to retain towrite goes further back). Agreed, I will do that. > I'll continue testing with this. Are you still seeing an increase in > such failures with the xfs_free_file_space() patch or has this quieted > those down? To early to sayi for sure, but signs are good - I've had xfstests actually complete without any stray delalloc asserts occurring on 1k block size filesystems for the first time in a couple of weeks. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs