Re: [PATCH] zonefs: move super block reading from page to folio

Damien Le Moal <dlemoal@xxxxxxxxxx> · Mon, 3 Jun 2024 09:30:30 +0900

On 6/2/24 02:51, Matthew Wilcox wrote:
> On Fri, May 31, 2024 at 10:28:50AM +0900, Damien Le Moal wrote:
>>>> This will stop working at some point.  It'll return NULL once we get
>>>> to the memdesc future (because the memdesc will be a slab, not a folio).
>>>
>>> Hmmm, xfs_buf.c plays a similar trick here for sub-page buffers.  I'm
>>> assuming that will get ported to ... whatever the memdesc future holds?
> 
> I don't think it does, exactly?  Are you referring to kmem_to_page()?
> That will continue to work.  You're not trying to get a folio from a
> slab allocation; that will start to fail.
> 
>>>> I think the right way to handle this is to call read_mapping_folio().
>>>> That will allocate a folio in the page cache for you (obeying the
>>>> minimum folio size).  Then you can examine the contents.  It should
>>>> actually remove code from zonefs.  Don't forget to call folio_put()
>>>> when you're done with it (either at unmount or at the end of mount if
>>>> you copy what you need elsewhere).
>>>
>>> The downside of using bd_mapping is that userspace can scribble all over
>>> the folio contents.  For zonefs that's less of a big deal because it
>>> only reads it once, but for everyone else (e.g. ext4) it's been a huge
>>
>> Yes, and zonefs super block is read-only, we never update it after formatting.
>>
>>> problem.  I guess you could always do max(ZONEFS_SUPER_SIZE,
>>> block_size(sb->s_bdev)) if you don't want to use the pagecache.
>>
>> Good point. ZONEFS_SUPER_SIZE is 4K and given that I only know of 512e and 4K
>> zoned block devices, this is not an issue yet. But better safe than sorry, so
>> doing the max() thing you propose is better. Will patch that.
> 
> I think you should use read_mapping_folio() for now instead of
> complicating zonefs.  Once there's a grand new buffer cache, switch to
> that, but I don't think you're introducing a significant vulnerability
> by using the block device's page cache.

I was not really thinking about vulnerability here, but rather compatibility
with devices having a block size larger than 4K... But given that these are rare
(at best), a fix for a more intelligent ZONEFS_SUPER_SIZE is not urgent, and not
hard at all anyway.

-- 
Damien Le Moal
Western Digital Research