On Mon, Mar 18, 2024 at 03:19:03PM +0800, Yu Kuai wrote: > Hi, Christoph! > > 在 2024/03/18 9:51, Yu Kuai 写道: > > Hi, > > > > 在 2024/03/18 9:32, Christoph Hellwig 写道: > > > On Mon, Mar 18, 2024 at 09:26:48AM +0800, Yu Kuai wrote: > > > > Because there is a real filesystem(devtmpfs) used for raw block devcie > > > > file operations, open syscall to devtmpfs: > > > > > > > > blkdev_open > > > > bdev = blkdev_get_no_open > > > > bdev_open -> pass in file is from devtmpfs > > > > -> in this case, file inode is from devtmpfs, > > > > > > But file->f_mapping->host should still point to the bdevfs inode, > > > and file->f_mapping->host is what everything in the I/O path should > > > be using. I mentioned this in https://lore.kernel.org/r/20240118-gemustert-aalen-ee71d0c69826@brauner "[...] if we want to have all code pass a file and we have code in fs/buffer.c like iomap_to_bh(): iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh, loff_t offset = block << inode->i_blkbits; bh->b_bdev = iomap->bdev; + bh->f_b_bdev = iomap->f_bdev; While that works for every single filesystem that uses block devices because they stash them somewhere (like s_bdev_file) it doesn't work for the bdev filesystem itself. So if the bdev filesystem calls into helpers that expect e.g., buffer_head->s_f_bdev to have been initialized from iomap->f_bdev this wouldn't work. So if we want to remove b_bdev from struct buffer_head and fully rely on f_b_bdev - and similar in iomap - then we need a story for the bdev fs itself. And I wasn't clear on what that would be." > > > > > > > Then later, in blkdev_iomap_begin(), bd_inode is passed in and there is > > > > no access to the devtmpfs file, we can't use s_bdev_file() as other > > > > filesystems here. > > > > > > We can just pass the file down in iomap_iter.private > > > > I can do this for blkdev_read_folio(), however, for other ops like > > blkdev_writepages(), I can't find a way to pass the file to > > iomap_iter.private yet. > > > > Any suggestions? > > I come up with an ideal: > > While opening the block_device the first time, store the generated new > file in "bd_inode->i_private". And release it after the last opener > close the block_device. > > The advantages are: > - multiple openers can share the same bdev_file; You mean use the file stashed in bdev_inode->i_private only to retrieve the inode/mapping in the block layer ops. > - raw block device ops can use the bdev_file as well, and there is no > need to distinguish iomap/buffer_head for raw block_device; > > Please let me know what do you think? It's equally ugly but probably slightly less error prone than the union approach. But please make that separate patches on top of the series. This is somewhat reminiscent of the approach that Dave suggested in the thread that I linked above. I only wonder whether we run into issue with multiple block device openers when the original opener opened the block device exclusively. So there might be some corner-cases.