On Fri, May 31, 2024 at 10:28:50AM +0900, Damien Le Moal wrote: > >> This will stop working at some point. It'll return NULL once we get > >> to the memdesc future (because the memdesc will be a slab, not a folio). > > > > Hmmm, xfs_buf.c plays a similar trick here for sub-page buffers. I'm > > assuming that will get ported to ... whatever the memdesc future holds? I don't think it does, exactly? Are you referring to kmem_to_page()? That will continue to work. You're not trying to get a folio from a slab allocation; that will start to fail. > >> I think the right way to handle this is to call read_mapping_folio(). > >> That will allocate a folio in the page cache for you (obeying the > >> minimum folio size). Then you can examine the contents. It should > >> actually remove code from zonefs. Don't forget to call folio_put() > >> when you're done with it (either at unmount or at the end of mount if > >> you copy what you need elsewhere). > > > > The downside of using bd_mapping is that userspace can scribble all over > > the folio contents. For zonefs that's less of a big deal because it > > only reads it once, but for everyone else (e.g. ext4) it's been a huge > > Yes, and zonefs super block is read-only, we never update it after formatting. > > > problem. I guess you could always do max(ZONEFS_SUPER_SIZE, > > block_size(sb->s_bdev)) if you don't want to use the pagecache. > > Good point. ZONEFS_SUPER_SIZE is 4K and given that I only know of 512e and 4K > zoned block devices, this is not an issue yet. But better safe than sorry, so > doing the max() thing you propose is better. Will patch that. I think you should use read_mapping_folio() for now instead of complicating zonefs. Once there's a grand new buffer cache, switch to that, but I don't think you're introducing a significant vulnerability by using the block device's page cache.