Re: [RFC 0/4] convert create_page_buffers to create_folio_buffers

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Sun, 16 Apr 2023 15:07:33 +0100

On Sat, Apr 15, 2023 at 10:26:42PM -0700, Luis Chamberlain wrote:
> On Sun, Apr 16, 2023 at 04:40:06AM +0100, Matthew Wilcox wrote:
> > I don't think we
> > should be overriding the aops, and if we narrow the scope of large folio
> > support in blockdev t only supporting folio_size == LBA size, it becomes
> > much more feasible.
> 
> I'm trying to think of the possible use cases where folio_size != LBA size
> and I cannot immediately think of some. Yes there are cases where a
> filesystem may use a different block for say meta data than data, but that
> I believe is side issue, ie, read/writes for small metadata would have
> to be accepted. At least for NVMe we have metadata size as part of the
> LBA format, but from what I understand no Linux filesystem yet uses that.

NVMe metadata is per-block metadata -- a CRC or similar.  Filesystem
metadata is things like directories, inode tables, free space bitmaps,
etc.

> struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,   
>                 bool retry)                                                     
> { 
[...]
>         head = NULL;  
>         offset = PAGE_SIZE;                                                     
>         while ((offset -= size) >= 0) {                                         
> 
> I see now what you say about the buffer head being of the block size
> bh->b_size = size above.

Yes, just changing that to 'offset = page_size(page);' will do the trick.

> > sb_bread() is used by most filesystems, and the buffer cache aliases
> > into the page cache.
> 
> I see thanks. I checked what xfs does and its xfs_readsb() uses its own
> xfs_buf_read_uncached(). It ends up calling xfs_buf_submit() and
> xfs_buf_ioapply_map() does it's own submit_bio(). So I'm curious why
> they did that.

IRIX didn't have an sb_bread() ;-)

> > In userspace, if I run 'dd if=blah of=/dev/sda1 bs=512 count=1 seek=N',
> > I can overwrite the superblock.  Do we want filesystems to see that
> > kind of vandalism, or do we want the mounted filesystem to have its
> > own copy of the data and overwrite what userspace wrote the next time it
> > updates the superblock?
> 
> Oh, what happens today?

Depends on the filesystem, I think?  Not really sure, to be honest.