Last kernel release we introduce CONFIG_BLK_DEV_WRITE_MOUNTED. By default this option is set. When it is set the long-standing behavior of being able to write to mounted block devices is enabled. But in order to guard against unintended corruption by writing to the block device buffer cache CONFIG_BLK_DEV_WRITE_MOUNTED can be turned off. In that case it isn't possible to write to mounted block devices anymore. A filesystem may open its block devices with BLK_OPEN_RESTRICT_WRITES which disallows concurrent BLK_OPEN_WRITE access. When we still had the bdev handle around we could recognize BLK_OPEN_RESTRICT_WRITES because the mode was passed around. Since we managed to get rid of the bdev handle we changed that logic to recognize BLK_OPEN_RESTRICT_WRITES based on whether the file was opened writable and writes to that block device are blocked. That logic doesn't work because we do allow BLK_OPEN_RESTRICT_WRITES to be specified without BLK_OPEN_WRITE. So fix the detection logic. Use O_EXCL as an indicator that BLK_OPEN_RESTRICT_WRITES has been requested. We do the exact same thing for pidfds where O_EXCL means that this is a pidfd that refers to a thread. For userspace open paths O_EXCL will never be retained but for internal opens where we open files that are never installed into a file descriptor table this is fine. Note that BLK_OPEN_RESTRICT_WRITES is an internal only flag that cannot directly be raised by userspace. It is implicitly raised during mounting. Passes xftests and blktests with CONFIG_BLK_DEV_WRITE_MOUNTED set and unset. Fixes: 321de651fa56 ("block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access") Reported-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> Link: https://lore.kernel.org/r/ZfyyEwu9Uq5Pgb94@xxxxxxxxxxxxxxxxxxxx Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> --- block/bdev.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/block/bdev.c b/block/bdev.c index 7a5f611c3d2e..f819f3086905 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -821,13 +821,12 @@ static void bdev_yield_write_access(struct file *bdev_file) return; bdev = file_bdev(bdev_file); - /* Yield exclusive or shared write access. */ - if (bdev_file->f_mode & FMODE_WRITE) { - if (bdev_writes_blocked(bdev)) - bdev_unblock_writes(bdev); - else - bdev->bd_writers--; - } + + /* O_EXCL is only set for internal BLK_OPEN_RESTRICT_WRITES. */ + if (bdev_file->f_flags & O_EXCL) + bdev_unblock_writes(bdev); + else if (bdev_file->f_mode & FMODE_WRITE) + bdev->bd_writers--; } /** @@ -946,6 +945,13 @@ static unsigned blk_to_file_flags(blk_mode_t mode) else WARN_ON_ONCE(true); + /* + * BLK_OPEN_RESTRICT_WRITES is never set from userspace and + * O_EXCL is stripped from userspace. + */ + if (mode & BLK_OPEN_RESTRICT_WRITES) + flags |= O_EXCL; + if (mode & BLK_OPEN_NDELAY) flags |= O_NDELAY; -- 2.43.0