On Thu, Feb 01, 2024 at 12:34:24PM +0100, Jan Kara wrote: > On Tue 23-01-24 14:26:51, Christian Brauner wrote: > > (1) Instead of bdev->bd_inode->i_mapping we do f_bdev->f_mapping > > (2) Instead of bdev->bd_inode we could do f_bdev->f_inode > > > > I mention this explicitly because (1) is dependent on how the block > > device is opened while (2) isn't. Consider: > > > > mount -t tmpfs tmpfs /mnt > > mknod /mnt/foo b <minor> <major> > > open("/mnt/foo", O_RDWR); > > > > then (1) doesn't work because f_bdev->f_inode is a tmpfs inode _not_ the > > actual bdev filesystem inode. But (2) is still the bd_inode->i_mapping > > as that's set up during bdev_open(). > > > > IOW, I'm explicitly _not_ going via f_bdev->f_inode but via > > f_bdev->f_mapping->host aka bdev_file_inode(f_bdev). Currently this > > isn't a problem because sb->s_bdev_file stashes the a block device file > > opened via bdev_open_by_*() which is always a file on the bdev > > filesystem. > > > > _If_ we ever wanted to allow userspace to pass a block device file > > descriptor during superblock creation. Say: > > > > fsconfig(fs_fd, FSCONFIG_CMD_CREATE_EXCL, "source", bdev_fd); > > > > then using f_bdev->f_inode would be very wrong. Another thing to keep in > > mind there would be that this would implicitly pin another filesystem. > > Say: > > > > mount -t ext4 /dev/sda /mnt > > mknod /mnt/foo b <minor> <major> > > bdev_fd = open("/mnt/foo", O_RDWR); > > > > fd_fs = fsopen("xfs") > > fsconfig(fd_fs, FSCONFIG_CMD_CREATE, "source", bdev_fd); > > fd_mnt = fsmount(fd_fs); > > move_mount(fd_mnt, "/mnt2"); > > > > umount /mnt # EBUSY > > > > Because the xfs filesystem now pins the ext4 filesystem via the > > bdev_file we're keeping. In other words, this is probably a bad idea and > > if we allow userspace to do this then we should only use the provided fd > > to lookup the block device and open our own handle to it. > > > > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> > > I suppose this is more or less a sample how to get rid of sb->s_bdev / > bd_inode dereferences AFAICT? Because otherwise I'm not sure why it was Yes, correct. That was just a way to show that you don't need that anymore. > included in this series... Yes, that would likely go separately. > > > @@ -5576,7 +5576,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb) > > * used to detect the metadata async write error. > > */ > > spin_lock_init(&sbi->s_bdev_wb_lock); > > - errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err, > > + errseq_check_and_advance(&sb->s_bdev_file->f_mapping->wb_err, > > &sbi->s_bdev_wb_err); > > So when we have struct file, it would be actually nicer to drop > EXT4_SB(sb)->s_bdev_wb_err completely and instead use > file_check_and_advance_wb_err(sb->s_bdev_file). But that's a separate > cleanup I suppose. Yep. I forgot about that helper.