Re: [PATCH v2 34/34] ext4: rely on sb->f_bdev only

Jan Kara <jack@xxxxxxx> · Thu, 1 Feb 2024 12:34:24 +0100

On Tue 23-01-24 14:26:51, Christian Brauner wrote:
> (1) Instead of bdev->bd_inode->i_mapping we do f_bdev->f_mapping
> (2) Instead of bdev->bd_inode we could do f_bdev->f_inode
> 
> I mention this explicitly because (1) is dependent on how the block
> device is opened while (2) isn't. Consider:
> 
> mount -t tmpfs tmpfs /mnt
> mknod /mnt/foo b <minor> <major>
> open("/mnt/foo", O_RDWR);
> 
> then (1) doesn't work because f_bdev->f_inode is a tmpfs inode _not_ the
> actual bdev filesystem inode. But (2) is still the bd_inode->i_mapping
> as that's set up during bdev_open().
> 
> IOW, I'm explicitly _not_ going via f_bdev->f_inode but via
> f_bdev->f_mapping->host aka bdev_file_inode(f_bdev). Currently this
> isn't a problem because sb->s_bdev_file stashes the a block device file
> opened via bdev_open_by_*() which is always a file on the bdev
> filesystem.
> 
> _If_ we ever wanted to allow userspace to pass a block device file
> descriptor during superblock creation. Say:
> 
> fsconfig(fs_fd, FSCONFIG_CMD_CREATE_EXCL, "source", bdev_fd);
> 
> then using f_bdev->f_inode would be very wrong. Another thing to keep in
> mind there would be that this would implicitly pin another filesystem.
> Say:
> 
> mount -t ext4 /dev/sda /mnt
> mknod /mnt/foo b <minor> <major>
> bdev_fd = open("/mnt/foo", O_RDWR);
> 
> fd_fs = fsopen("xfs")
> fsconfig(fd_fs, FSCONFIG_CMD_CREATE, "source", bdev_fd);
> fd_mnt = fsmount(fd_fs);
> move_mount(fd_mnt, "/mnt2");
> 
> umount /mnt # EBUSY
> 
> Because the xfs filesystem now pins the ext4 filesystem via the
> bdev_file we're keeping. In other words, this is probably a bad idea and
> if we allow userspace to do this then we should only use the provided fd
> to lookup the block device and open our own handle to it.
> 
> Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx>

I suppose this is more or less a sample how to get rid of sb->s_bdev /
bd_inode dereferences AFAICT? Because otherwise I'm not sure why it was
included in this series...

> @@ -5576,7 +5576,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
>  	 * used to detect the metadata async write error.
>  	 */
>  	spin_lock_init(&sbi->s_bdev_wb_lock);
> -	errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
> +	errseq_check_and_advance(&sb->s_bdev_file->f_mapping->wb_err,
>  				 &sbi->s_bdev_wb_err);

So when we have struct file, it would be actually nicer to drop
EXT4_SB(sb)->s_bdev_wb_err completely and instead use
file_check_and_advance_wb_err(sb->s_bdev_file). But that's a separate
cleanup I suppose.

								Honza

-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR