Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 18, 2024 at 03:19:03PM +0800, Yu Kuai wrote:
> Hi, Christoph!
> 
> 在 2024/03/18 9:51, Yu Kuai 写道:
> > Hi,
> > 
> > 在 2024/03/18 9:32, Christoph Hellwig 写道:
> > > On Mon, Mar 18, 2024 at 09:26:48AM +0800, Yu Kuai wrote:
> > > > Because there is a real filesystem(devtmpfs) used for raw block devcie
> > > > file operations, open syscall to devtmpfs:
> > > > 
> > > > blkdev_open
> > > >   bdev = blkdev_get_no_open
> > > >   bdev_open -> pass in file is from devtmpfs
> > > >   -> in this case, file inode is from devtmpfs,
> > > 
> > > But file->f_mapping->host should still point to the bdevfs inode,
> > > and file->f_mapping->host is what everything in the I/O path should
> > > be using.

I mentioned this in
https://lore.kernel.org/r/20240118-gemustert-aalen-ee71d0c69826@brauner

"[...] if we want to have all code pass a file and we have code in
fs/buffer.c like iomap_to_bh():

iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
        loff_t offset = block << inode->i_blkbits;

        bh->b_bdev = iomap->bdev;
+       bh->f_b_bdev = iomap->f_bdev;

While that works for every single filesystem that uses block devices
because they stash them somewhere (like s_bdev_file) it doesn't work for
the bdev filesystem itself. So if the bdev filesystem calls into helpers
that expect e.g., buffer_head->s_f_bdev to have been initialized from
iomap->f_bdev this wouldn't work.

So if we want to remove b_bdev from struct buffer_head and fully rely on
f_b_bdev - and similar in iomap - then we need a story for the bdev fs
itself. And I wasn't clear on what that would be."

> > > 
> > > > Then later, in blkdev_iomap_begin(), bd_inode is passed in and there is
> > > > no access to the devtmpfs file, we can't use s_bdev_file() as other
> > > > filesystems here.
> > > 
> > > We can just pass the file down in iomap_iter.private
> > 
> > I can do this for blkdev_read_folio(), however, for other ops like
> > blkdev_writepages(), I can't find a way to pass the file to
> > iomap_iter.private yet.
> > 
> > Any suggestions?
> 
> I come up with an ideal:
> 
> While opening the block_device the first time, store the generated new
> file in "bd_inode->i_private". And release it after the last opener
> close the block_device.
> 
> The advantages are:
>  - multiple openers can share the same bdev_file;

You mean use the file stashed in bdev_inode->i_private only to retrieve
the inode/mapping in the block layer ops.

>  - raw block device ops can use the bdev_file as well, and there is no
> need to distinguish iomap/buffer_head for raw block_device;
> 
> Please let me know what do you think?

It's equally ugly but probably slightly less error prone than the union
approach. But please make that separate patches on top of the series.

This is somewhat reminiscent of the approach that Dave suggested in the
thread that I linked above. I only wonder whether we run into issue with
multiple block device openers when the original opener opened the block
device exclusively. So there might be some corner-cases.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux