Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 18, 2024 at 11:07:49AM +0100, Christian Brauner wrote:
> On Mon, Mar 18, 2024 at 03:19:03PM +0800, Yu Kuai wrote:
> > Hi, Christoph!
> > 
> > 在 2024/03/18 9:51, Yu Kuai 写道:
> > > Hi,
> > > 
> > > 在 2024/03/18 9:32, Christoph Hellwig 写道:
> > > > On Mon, Mar 18, 2024 at 09:26:48AM +0800, Yu Kuai wrote:
> > > > > Because there is a real filesystem(devtmpfs) used for raw block devcie
> > > > > file operations, open syscall to devtmpfs:

Don't forget:

mknod /my/xfs/file/system b 8 0

which means you're not opening it via devtmpfs but via xfs. IOW, the
inode for that file is from xfs.

> > > > > 
> > > > > blkdev_open
> > > > >   bdev = blkdev_get_no_open
> > > > >   bdev_open -> pass in file is from devtmpfs
> > > > >   -> in this case, file inode is from devtmpfs,
> > > > 
> > > > But file->f_mapping->host should still point to the bdevfs inode,
> > > > and file->f_mapping->host is what everything in the I/O path should
> > > > be using.
> 
> I mentioned this in
> https://lore.kernel.org/r/20240118-gemustert-aalen-ee71d0c69826@brauner
> 
> "[...] if we want to have all code pass a file and we have code in
> fs/buffer.c like iomap_to_bh():
> 
> iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
>         loff_t offset = block << inode->i_blkbits;
> 
>         bh->b_bdev = iomap->bdev;
> +       bh->f_b_bdev = iomap->f_bdev;
> 
> While that works for every single filesystem that uses block devices
> because they stash them somewhere (like s_bdev_file) it doesn't work for
> the bdev filesystem itself. So if the bdev filesystem calls into helpers
> that expect e.g., buffer_head->s_f_bdev to have been initialized from
> iomap->f_bdev this wouldn't work.
> 
> So if we want to remove b_bdev from struct buffer_head and fully rely on
> f_b_bdev - and similar in iomap - then we need a story for the bdev fs
> itself. And I wasn't clear on what that would be."
> 
> > > > 
> > > > > Then later, in blkdev_iomap_begin(), bd_inode is passed in and there is
> > > > > no access to the devtmpfs file, we can't use s_bdev_file() as other
> > > > > filesystems here.
> > > > 
> > > > We can just pass the file down in iomap_iter.private
> > > 
> > > I can do this for blkdev_read_folio(), however, for other ops like
> > > blkdev_writepages(), I can't find a way to pass the file to
> > > iomap_iter.private yet.
> > > 
> > > Any suggestions?
> > 
> > I come up with an ideal:
> > 
> > While opening the block_device the first time, store the generated new
> > file in "bd_inode->i_private". And release it after the last opener
> > close the block_device.
> > 
> > The advantages are:
> >  - multiple openers can share the same bdev_file;
> 
> You mean use the file stashed in bdev_inode->i_private only to retrieve
> the inode/mapping in the block layer ops.
> 
> >  - raw block device ops can use the bdev_file as well, and there is no
> > need to distinguish iomap/buffer_head for raw block_device;
> > 
> > Please let me know what do you think?
> 
> It's equally ugly but probably slightly less error prone than the union
> approach. But please make that separate patches on top of the series.
> 
> This is somewhat reminiscent of the approach that Dave suggested in the
> thread that I linked above. I only wonder whether we run into issue with
> multiple block device openers when the original opener opened the block
> device exclusively. So there might be some corner-cases.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux