On Tue 23-01-24 14:26:51, Christian Brauner wrote: > (1) Instead of bdev->bd_inode->i_mapping we do f_bdev->f_mapping > (2) Instead of bdev->bd_inode we could do f_bdev->f_inode > > I mention this explicitly because (1) is dependent on how the block > device is opened while (2) isn't. Consider: > > mount -t tmpfs tmpfs /mnt > mknod /mnt/foo b <minor> <major> > open("/mnt/foo", O_RDWR); > > then (1) doesn't work because f_bdev->f_inode is a tmpfs inode _not_ the > actual bdev filesystem inode. But (2) is still the bd_inode->i_mapping > as that's set up during bdev_open(). > > IOW, I'm explicitly _not_ going via f_bdev->f_inode but via > f_bdev->f_mapping->host aka bdev_file_inode(f_bdev). Currently this > isn't a problem because sb->s_bdev_file stashes the a block device file > opened via bdev_open_by_*() which is always a file on the bdev > filesystem. > > _If_ we ever wanted to allow userspace to pass a block device file > descriptor during superblock creation. Say: > > fsconfig(fs_fd, FSCONFIG_CMD_CREATE_EXCL, "source", bdev_fd); > > then using f_bdev->f_inode would be very wrong. Another thing to keep in > mind there would be that this would implicitly pin another filesystem. > Say: > > mount -t ext4 /dev/sda /mnt > mknod /mnt/foo b <minor> <major> > bdev_fd = open("/mnt/foo", O_RDWR); > > fd_fs = fsopen("xfs") > fsconfig(fd_fs, FSCONFIG_CMD_CREATE, "source", bdev_fd); > fd_mnt = fsmount(fd_fs); > move_mount(fd_mnt, "/mnt2"); > > umount /mnt # EBUSY > > Because the xfs filesystem now pins the ext4 filesystem via the > bdev_file we're keeping. In other words, this is probably a bad idea and > if we allow userspace to do this then we should only use the provided fd > to lookup the block device and open our own handle to it. > > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> I suppose this is more or less a sample how to get rid of sb->s_bdev / bd_inode dereferences AFAICT? Because otherwise I'm not sure why it was included in this series... > @@ -5576,7 +5576,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb) > * used to detect the metadata async write error. > */ > spin_lock_init(&sbi->s_bdev_wb_lock); > - errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err, > + errseq_check_and_advance(&sb->s_bdev_file->f_mapping->wb_err, > &sbi->s_bdev_wb_err); So when we have struct file, it would be actually nicer to drop EXT4_SB(sb)->s_bdev_wb_err completely and instead use file_check_and_advance_wb_err(sb->s_bdev_file). But that's a separate cleanup I suppose. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR