On 5/31/24 10:16 AM, Darrick J. Wong wrote: > On Thu, May 23, 2024 at 02:41:51AM +0100, Matthew Wilcox wrote: >> On Tue, May 14, 2024 at 05:22:08PM +0200, Johannes Thumshirn wrote: >>> From: Johannes Thumshirn <johannes.thumshirn@xxxxxxx> >>> >>> Move reading of the on-disk superblock from page to kmalloc()ed memory. >> >> No, this is wrong. >> >>> + super = kzalloc(ZONEFS_SUPER_SIZE, GFP_KERNEL); >>> + if (!super) >>> return -ENOMEM; >>> >>> + folio = virt_to_folio(super); >> >> This will stop working at some point. It'll return NULL once we get >> to the memdesc future (because the memdesc will be a slab, not a folio). > > Hmmm, xfs_buf.c plays a similar trick here for sub-page buffers. I'm > assuming that will get ported to ... whatever the memdesc future holds? > >>> bio_init(&bio, sb->s_bdev, &bio_vec, 1, REQ_OP_READ); >>> bio.bi_iter.bi_sector = 0; >>> - __bio_add_page(&bio, page, PAGE_SIZE, 0); >>> + bio_add_folio_nofail(&bio, folio, ZONEFS_SUPER_SIZE, >>> + offset_in_folio(folio, super)); >> >> It also doesn't solve the problem of trying to read 4KiB from a device >> with 16KiB sectors. We'll have to fail the bio because there isn't >> enough memory in the bio to store one block. >> >> I think the right way to handle this is to call read_mapping_folio(). >> That will allocate a folio in the page cache for you (obeying the >> minimum folio size). Then you can examine the contents. It should >> actually remove code from zonefs. Don't forget to call folio_put() >> when you're done with it (either at unmount or at the end of mount if >> you copy what you need elsewhere). > > The downside of using bd_mapping is that userspace can scribble all over > the folio contents. For zonefs that's less of a big deal because it > only reads it once, but for everyone else (e.g. ext4) it's been a huge Yes, and zonefs super block is read-only, we never update it after formatting. > problem. I guess you could always do max(ZONEFS_SUPER_SIZE, > block_size(sb->s_bdev)) if you don't want to use the pagecache. Good point. ZONEFS_SUPER_SIZE is 4K and given that I only know of 512e and 4K zoned block devices, this is not an issue yet. But better safe than sorry, so doing the max() thing you propose is better. Will patch that. -- Damien Le Moal Western Digital Research