On Mon, Mar 25, 2024 at 07:51:27PM +0800, Yu Kuai wrote: > Hi, > > 在 2024/03/24 0:11, Christian Brauner 写道: > > Last kernel release we introduce CONFIG_BLK_DEV_WRITE_MOUNTED. By > > default this option is set. When it is set the long-standing behavior > > of being able to write to mounted block devices is enabled. > > > > But in order to guard against unintended corruption by writing to the > > block device buffer cache CONFIG_BLK_DEV_WRITE_MOUNTED can be turned > > off. In that case it isn't possible to write to mounted block devices > > anymore. > > > > A filesystem may open its block devices with BLK_OPEN_RESTRICT_WRITES > > which disallows concurrent BLK_OPEN_WRITE access. When we still had the > > bdev handle around we could recognize BLK_OPEN_RESTRICT_WRITES because > > the mode was passed around. Since we managed to get rid of the bdev > > handle we changed that logic to recognize BLK_OPEN_RESTRICT_WRITES based > > on whether the file was opened writable and writes to that block device > > are blocked. That logic doesn't work because we do allow > > BLK_OPEN_RESTRICT_WRITES to be specified without BLK_OPEN_WRITE. > > I don't get it here, looks like there are no such use case. All users > passed in BLK_OPEN_RESTRICT_WRITES together with BLK_OPEN_WRITE. > > Is the following root cause here? > > 1) t1 open with BLK_OPEN_WRITE > 2) t2 open with BLK_OPEN_RESTRICT_WRITES, with bdev_block_writes(), yes > we don't wait for t1 to close; > 3) t1 close, after the commit, bdev_unblock_writes() is called > unexpected. > > Following openers will succeed although t2 doesn't close; > > > > So fix the detection logic. Use O_EXCL as an indicator that > > BLK_OPEN_RESTRICT_WRITES has been requested. We do the exact same thing > > for pidfds where O_EXCL means that this is a pidfd that refers to a > > thread. For userspace open paths O_EXCL will never be retained but for > > internal opens where we open files that are never installed into a file > > descriptor table this is fine. > > From the path blkdev_open(), the file is from devtmpfs, and user can > pass in O_EXCL for that file, and that file will be used later in > blkdev_release() -> bdev_release() -> bdev_yield_write_access(). It can't because the VFS strips O_EXCL after the file has been opened. Only internal opens can retain this flag. See do_dentry_open(). Or do you mean something else?