On Wed, Jan 31, 2024 at 08:14:37AM +0100, Christoph Hellwig wrote: > On Tue, Jan 30, 2024 at 10:50:16PM +0100, Jan Kara wrote: > > Well, batch release needs to be only here because if writeback_get_folio() > > returns NULL, the batch has been already released by it. > > Indeed. > > > So what would be > > duplicated is only the error assignment. But I'm fine with the version in > > the following email and actually somewhat prefer it compared the yet > > another variant you've sent. > > So how about another variant, this is closer to your original suggestion. > But I've switched around the ordered of the folio or not branches > from my original patch, and completely reworked and (IMHO) improved the > comments. it replaces patch 19 instead of being incremental to be > readable: > Not a thorough review but at first glance I like it. I think the direct function call makes things easier to follow. Just one quick thought... ... > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index 0763c4353a676a..eefcb00cb7b227 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c ... > @@ -2458,60 +2433,107 @@ static struct folio *writeback_get_folio(struct address_space *mapping, > return folio; > } > ... > +struct folio *writeback_iter(struct address_space *mapping, > + struct writeback_control *wbc, struct folio *folio, int *error) > { > - if (wbc->range_cyclic) > - wbc->index = mapping->writeback_index; /* prev offset */ > - else > - wbc->index = wbc->range_start >> PAGE_SHIFT; > - > - if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) > - tag_pages_for_writeback(mapping, wbc->index, wbc_end(wbc)); > - > - wbc->err = 0; > - folio_batch_init(&wbc->fbatch); > - return writeback_get_folio(mapping, wbc); > -} > + if (!folio) { > + folio_batch_init(&wbc->fbatch); > + wbc->err = 0; The implied field initialization via !folio feels a little wonky to me just because it's not clear from the client code that both fields must be initialized. Even though the interface is simpler, I wonder if it's still worth having a dumb/macro type init function that at least does the batch and error field initialization. Or on second thought maybe having writeback_iter() reset *error as well on folio == NULL might be a little cleaner without changing the interface.... Brian > > -struct folio *writeback_iter_next(struct address_space *mapping, > - struct writeback_control *wbc, struct folio *folio, int error) > -{ > - unsigned long nr = folio_nr_pages(folio); > + /* > + * For range cyclic writeback we remember where we stopped so > + * that we can continue where we stopped. > + * > + * For non-cyclic writeback we always start at the beginning of > + * the passed in range. > + */ > + if (wbc->range_cyclic) > + wbc->index = mapping->writeback_index; > + else > + wbc->index = wbc->range_start >> PAGE_SHIFT; > > - wbc->nr_to_write -= nr; > + /* > + * To avoid livelocks when other processes dirty new pages, we > + * first tag pages which should be written back and only then > + * start writing them. > + * > + * For data-integrity sync we have to be careful so that we do > + * not miss some pages (e.g., because some other process has > + * cleared the TOWRITE tag we set). The rule we follow is that > + * TOWRITE tag can be cleared only by the process clearing the > + * DIRTY tag (and submitting the page for I/O). > + */ > + if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) > + tag_pages_for_writeback(mapping, wbc->index, > + wbc_end(wbc)); > + } else { > + wbc->nr_to_write -= folio_nr_pages(folio); > > - /* > - * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return value. > - * Eventually all instances should just unlock the folio themselves and > - * return 0; > - */ > - if (error == AOP_WRITEPAGE_ACTIVATE) { > - folio_unlock(folio); > - error = 0; > + /* > + * For integrity writeback we have to keep going until we have > + * written all the folios we tagged for writeback prior to > + * entering the writeback loop, even if we run past > + * wbc->nr_to_write or encounter errors, and just stash away > + * the first error we encounter in wbc->err so that it can > + * be retrieved on return. > + * > + * This is because the file system may still have state to clear > + * for each folio. We'll eventually return the first error > + * encountered. > + */ > + if (wbc->sync_mode == WB_SYNC_ALL) { > + if (*error && !wbc->err) > + wbc->err = *error; > + } else { > + if (*error || wbc->nr_to_write <= 0) > + goto done; > + } > } > > - if (error && !wbc->err) > - wbc->err = error; > + folio = writeback_get_folio(mapping, wbc); > + if (!folio) { > + /* > + * For range cyclic writeback not finding another folios means > + * that we are at the end of the file. In that case go back > + * to the start of the file for the next call. > + */ > + if (wbc->range_cyclic) > + mapping->writeback_index = 0; > > - /* > - * For integrity sync we have to keep going until we have written all > - * the folios we tagged for writeback prior to entering the writeback > - * loop, even if we run past wbc->nr_to_write or encounter errors. > - * This is because the file system may still have state to clear for > - * each folio. We'll eventually return the first error encountered. > - * > - * For background writeback just push done_index past this folio so that > - * we can just restart where we left off and media errors won't choke > - * writeout for the entire file. > - */ > - if (wbc->sync_mode == WB_SYNC_NONE && > - (wbc->err || wbc->nr_to_write <= 0)) { > - writeback_finish(mapping, wbc, folio->index + nr); > - return NULL; > + /* > + * Return the first error we encountered (if there was any) to > + * the caller now that we are done. > + */ > + *error = wbc->err; > } > + return folio; > > - return writeback_get_folio(mapping, wbc); > +done: > + if (wbc->range_cyclic) > + mapping->writeback_index = folio->index + folio_nr_pages(folio); > + folio_batch_release(&wbc->fbatch); > + return NULL; > } > > /** > @@ -2549,13 +2571,18 @@ int write_cache_pages(struct address_space *mapping, > struct writeback_control *wbc, writepage_t writepage, > void *data) > { > - struct folio *folio; > - int error; > + struct folio *folio = NULL; > + int error = 0; > > - for_each_writeback_folio(mapping, wbc, folio, error) > + while ((folio = writeback_iter(mapping, wbc, folio, &error))) { > error = writepage(folio, wbc, data); > + if (error == AOP_WRITEPAGE_ACTIVATE) { > + folio_unlock(folio); > + error = 0; > + } > + } > > - return wbc->err; > + return error; > } > EXPORT_SYMBOL(write_cache_pages); > > @@ -2563,13 +2590,17 @@ static int writeback_use_writepage(struct address_space *mapping, > struct writeback_control *wbc) > { > struct blk_plug plug; > - struct folio *folio; > - int err; > + struct folio *folio = NULL; > + int err = 0; > > blk_start_plug(&plug); > - for_each_writeback_folio(mapping, wbc, folio, err) { > + while ((folio = writeback_iter(mapping, wbc, folio, &err))) { > err = mapping->a_ops->writepage(&folio->page, wbc); > mapping_set_error(mapping, err); > + if (err == AOP_WRITEPAGE_ACTIVATE) { > + folio_unlock(folio); > + err = 0; > + } > } > blk_finish_plug(&plug); > >