Re: [PATCH] zonefs: move super block reading from page to folio

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Thu, 30 May 2024 18:16:16 -0700

On Thu, May 23, 2024 at 02:41:51AM +0100, Matthew Wilcox wrote:
> On Tue, May 14, 2024 at 05:22:08PM +0200, Johannes Thumshirn wrote:
> > From: Johannes Thumshirn <johannes.thumshirn@xxxxxxx>
> > 
> > Move reading of the on-disk superblock from page to kmalloc()ed memory.
> 
> No, this is wrong.
> 
> > +	super = kzalloc(ZONEFS_SUPER_SIZE, GFP_KERNEL);
> > +	if (!super)
> >  		return -ENOMEM;
> >  
> > +	folio = virt_to_folio(super);
> 
> This will stop working at some point.  It'll return NULL once we get
> to the memdesc future (because the memdesc will be a slab, not a folio).

Hmmm, xfs_buf.c plays a similar trick here for sub-page buffers.  I'm
assuming that will get ported to ... whatever the memdesc future holds?

> >  	bio_init(&bio, sb->s_bdev, &bio_vec, 1, REQ_OP_READ);
> >  	bio.bi_iter.bi_sector = 0;
> > -	__bio_add_page(&bio, page, PAGE_SIZE, 0);
> > +	bio_add_folio_nofail(&bio, folio, ZONEFS_SUPER_SIZE,
> > +			     offset_in_folio(folio, super));
> 
> It also doesn't solve the problem of trying to read 4KiB from a device
> with 16KiB sectors.  We'll have to fail the bio because there isn't
> enough memory in the bio to store one block.
> 
> I think the right way to handle this is to call read_mapping_folio().
> That will allocate a folio in the page cache for you (obeying the
> minimum folio size).  Then you can examine the contents.  It should
> actually remove code from zonefs.  Don't forget to call folio_put()
> when you're done with it (either at unmount or at the end of mount if
> you copy what you need elsewhere).

The downside of using bd_mapping is that userspace can scribble all over
the folio contents.  For zonefs that's less of a big deal because it
only reads it once, but for everyone else (e.g. ext4) it's been a huge
problem.  I guess you could always do max(ZONEFS_SUPER_SIZE,
block_size(sb->s_bdev)) if you don't want to use the pagecache.

<more mumbling about buffer caches>

--D