On Mon, 5 Nov 2018 11:36:13 -0500 Brian Foster <bfoster@xxxxxxxxxx> wrote: > write_cache_pages() currently breaks out of the writepage loop in > the event of a ->writepage() error. This causes problems for > integrity writeback on XFS For the uninitiated, please define the term "integrity writeback". Quite carefully ;) I'm not sure what it actually means. grepping fs/xfs for "integrity" doesn't reveal anything. <reads the code> OK, it appears the term means "to sync data to disk" as opposed to "periodic dirty memory cleaning". I guess we don't have particularly well-established terms for the two concepts. > in the event of a persistent error as XFS > expects to process every dirty+delalloc page such that it can > discard delalloc blocks when real block allocation fails. Failure > to handle all delalloc pages leaves the filesystem in an > inconsistent state if the integrity writeback happens to be due to > an unmount, for example. > > Update write_cache_pages() to continue processing pages for > integrity writeback regardless of ->writepage() errors. Save the > first encountered error and return it once complete. This > facilitates XFS or any other fs that expects integrity writeback to > process the entire set of dirty pages regardless of errors. > Background writeback continues to exit on the first error > encountered. > > ... > > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -2156,6 +2156,7 @@ int write_cache_pages(struct address_space *mapping, > { > int ret = 0; > int done = 0; > + int error; > struct pagevec pvec; > int nr_pages; > pgoff_t uninitialized_var(writeback_index); > @@ -2236,25 +2237,29 @@ int write_cache_pages(struct address_space *mapping, > goto continue_unlock; > > trace_wbc_writepage(wbc, inode_to_bdi(mapping->host)); > - ret = (*writepage)(page, wbc, data); > - if (unlikely(ret)) { > - if (ret == AOP_WRITEPAGE_ACTIVATE) { > + error = (*writepage)(page, wbc, data); > + if (unlikely(error)) { > + if (error == AOP_WRITEPAGE_ACTIVATE) { > unlock_page(page); > - ret = 0; > - } else { > + error = 0; > + } else if (wbc->sync_mode != WB_SYNC_ALL && > + !wbc->for_sync) { And here we're determining that it is not a sync-data-to-disk operation, hence it must be a clean-dirty-pages operation. This isn't very well-controlled, is it? It's an inference which was put together by examining current callers, I assume? It would be good if we could force callers to be explicit about their intent here. But I'm not sure that adding a new writeback_sync_mode is the way to do this. At a minimum it would be good to have careful comments in here explaining what is going on, justifying the above inference, explaining the xfs requirement (hopefully in a way which isn't xfs-specific). > /* > - * done_index is set past this page, > - * so media errors will not choke > + * done_index is set past this page, so > + * media errors will not choke > * background writeout for the entire > * file. This has consequences for > * range_cyclic semantics (ie. it may > * not be suitable for data integrity > * writeout). > */