On Thu, Jan 26, 2023 at 08:15:04PM -0800, Eric Biggers wrote: > On Thu, Jan 26, 2023 at 08:24:08PM +0000, Matthew Wilcox (Oracle) wrote: > > int ext4_mpage_readpages(struct inode *inode, > > - struct readahead_control *rac, struct page *page) > > + struct readahead_control *rac, struct folio *folio) > > { > > struct bio *bio = NULL; > > sector_t last_block_in_bio = 0; > > @@ -247,16 +247,15 @@ int ext4_mpage_readpages(struct inode *inode, > > int fully_mapped = 1; > > unsigned first_hole = blocks_per_page; > > > > - if (rac) { > > - page = readahead_page(rac); > > - prefetchw(&page->flags); > > - } > > + if (rac) > > + folio = readahead_folio(rac); > > + prefetchw(&folio->flags); > > Unlike readahead_page(), readahead_folio() puts the folio immediately. Is that > really safe? It's safe until we unlock the page. The page cache holds a refcount, and truncation has to lock the page before it can remove it from the page cache. Putting the refcount in readahead_folio() is a transitional step; once all filesystems are converted to use readahead_folio(), I'll hoist the refcount put to the caller. Having ->readahead() and ->read_folio() with different rules for who puts the folio is a long-standing mistake. > > @@ -299,11 +298,11 @@ int ext4_mpage_readpages(struct inode *inode, > > > > if (ext4_map_blocks(NULL, inode, &map, 0) < 0) { > > set_error_page: > > - SetPageError(page); > > - zero_user_segment(page, 0, > > - PAGE_SIZE); > > - unlock_page(page); > > - goto next_page; > > + folio_set_error(folio); > > + folio_zero_segment(folio, 0, > > + folio_size(folio)); > > + folio_unlock(folio); > > + continue; > > This is 'continuing' the inner loop, not the outer loop as it should. Oops. Will fix. I didn't get any extra failures from xfstests with this bug, although I suspect I wasn't testing with block size < page size, which is probably needed to make a difference.