Hello! On Tue 19-03-24 16:26:19, Yu Kuai wrote: > 在 2024/03/19 7:22, Christoph Hellwig 写道: > > On Mon, Mar 18, 2024 at 03:19:03PM +0800, Yu Kuai wrote: > > > I come up with an ideal: > > > > > > While opening the block_device the first time, store the generated new > > > file in "bd_inode->i_private". And release it after the last opener > > > close the block_device. > > > > > > The advantages are: > > > - multiple openers can share the same bdev_file; > > > - raw block device ops can use the bdev_file as well, and there is no > > > need to distinguish iomap/buffer_head for raw block_device; > > > > > > Please let me know what do you think? > > > > That does sound very reasonable to me. > > > I just implement the ideal with following patch(not fully tested, just > boot and some blktests) So I was looking into this and I'm not sure I 100% understand the problem. I understand that the inode you get e.g. in blkdev_get_block(), blkdev_iomap_begin() etc. may be an arbitrary filesystem block device inode. But why can't you use I_BDEV(inode->i_mapping->host) to get to the block device instead of your file_bdev(inode->i_private)? I don't see any advantage in stashing away that special bdev_file into inode->i_private but perhaps I'm missing something... Honza > diff --git a/block/fops.c b/block/fops.c > index 4037ae72a919..059f6c7d3c09 100644 > --- a/block/fops.c > +++ b/block/fops.c > @@ -382,7 +382,7 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, > struct iov_iter *iter) > static int blkdev_iomap_begin(struct inode *inode, loff_t offset, loff_t > length, > unsigned int flags, struct iomap *iomap, struct iomap > *srcmap) > { > - struct block_device *bdev = I_BDEV(inode); > + struct block_device *bdev = file_bdev(inode->i_private); > loff_t isize = i_size_read(inode); > > iomap->bdev = bdev; > @@ -404,7 +404,7 @@ static const struct iomap_ops blkdev_iomap_ops = { > static int blkdev_get_block(struct inode *inode, sector_t iblock, > struct buffer_head *bh, int create) > { > - bh->b_bdev = I_BDEV(inode); > + bh->b_bdev = file_bdev(inode->i_private); > bh->b_blocknr = iblock; > set_buffer_mapped(bh); > return 0; > @@ -598,6 +598,7 @@ blk_mode_t file_to_blk_mode(struct file *file) > > static int blkdev_open(struct inode *inode, struct file *filp) > { > + struct file *bdev_file; > struct block_device *bdev; > blk_mode_t mode; > int ret; > @@ -614,9 +615,28 @@ static int blkdev_open(struct inode *inode, struct file > *filp) > if (!bdev) > return -ENXIO; > > + bdev_file = alloc_and_init_bdev_file(bdev, > + BLK_OPEN_READ | BLK_OPEN_WRITE, NULL); > + if (IS_ERR(bdev_file)) { > + blkdev_put_no_open(bdev); > + return PTR_ERR(bdev_file); > + } > + > + bdev_file->private_data = ERR_PTR(-EINVAL); > + get_bdev_file(bdev, bdev_file); > ret = bdev_open(bdev, mode, filp->private_data, NULL, filp); > - if (ret) > + if (ret) { > + put_bdev_file(bdev); > blkdev_put_no_open(bdev); > + } else { > + filp->f_flags |= O_LARGEFILE; > + filp->f_mode |= FMODE_BUF_RASYNC | FMODE_CAN_ODIRECT; > + if (bdev_nowait(bdev)) > + filp->f_mode |= FMODE_NOWAIT; > + filp->f_mapping = bdev_mapping(bdev); > + filp->f_wb_err = > filemap_sample_wb_err(bdev_file->f_mapping); > + } > + > return ret; > } > > > . > > > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR