Re: [PATCH 24/31] ext4: Convert ext4_mpage_readpages() to work on folios

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Fri, 27 Jan 2023 16:08:07 +0000

On Thu, Jan 26, 2023 at 08:15:04PM -0800, Eric Biggers wrote:
> On Thu, Jan 26, 2023 at 08:24:08PM +0000, Matthew Wilcox (Oracle) wrote:
> >  int ext4_mpage_readpages(struct inode *inode,
> > -		struct readahead_control *rac, struct page *page)
> > +		struct readahead_control *rac, struct folio *folio)
> >  {
> >  	struct bio *bio = NULL;
> >  	sector_t last_block_in_bio = 0;
> > @@ -247,16 +247,15 @@ int ext4_mpage_readpages(struct inode *inode,
> >  		int fully_mapped = 1;
> >  		unsigned first_hole = blocks_per_page;
> >  
> > -		if (rac) {
> > -			page = readahead_page(rac);
> > -			prefetchw(&page->flags);
> > -		}
> > +		if (rac)
> > +			folio = readahead_folio(rac);
> > +		prefetchw(&folio->flags);
> 
> Unlike readahead_page(), readahead_folio() puts the folio immediately.  Is that
> really safe?

It's safe until we unlock the page.  The page cache holds a refcount,
and truncation has to lock the page before it can remove it from the
page cache.

Putting the refcount in readahead_folio() is a transitional step; once
all filesystems are converted to use readahead_folio(), I'll hoist the
refcount put to the caller.  Having ->readahead() and ->read_folio()
with different rules for who puts the folio is a long-standing mistake.

> > @@ -299,11 +298,11 @@ int ext4_mpage_readpages(struct inode *inode,
> >  
> >  				if (ext4_map_blocks(NULL, inode, &map, 0) < 0) {
> >  				set_error_page:
> > -					SetPageError(page);
> > -					zero_user_segment(page, 0,
> > -							  PAGE_SIZE);
> > -					unlock_page(page);
> > -					goto next_page;
> > +					folio_set_error(folio);
> > +					folio_zero_segment(folio, 0,
> > +							  folio_size(folio));
> > +					folio_unlock(folio);
> > +					continue;
> 
> This is 'continuing' the inner loop, not the outer loop as it should.

Oops.  Will fix.  I didn't get any extra failures from xfstests
with this bug, although I suspect I wasn't testing with block size <
page size, which is probably needed to make a difference.