On Tue 26-03-24 14:17:20, Christian Brauner wrote: > On Tue, Mar 26, 2024 at 01:57:15PM +0100, Jan Kara wrote: > > On Sat 23-03-24 17:11:19, Christian Brauner wrote: > > > Last kernel release we introduce CONFIG_BLK_DEV_WRITE_MOUNTED. By > > > default this option is set. When it is set the long-standing behavior > > > of being able to write to mounted block devices is enabled. > > > > > > But in order to guard against unintended corruption by writing to the > > > block device buffer cache CONFIG_BLK_DEV_WRITE_MOUNTED can be turned > > > off. In that case it isn't possible to write to mounted block devices > > > anymore. > > > > > > A filesystem may open its block devices with BLK_OPEN_RESTRICT_WRITES > > > which disallows concurrent BLK_OPEN_WRITE access. When we still had the > > > bdev handle around we could recognize BLK_OPEN_RESTRICT_WRITES because > > > the mode was passed around. Since we managed to get rid of the bdev > > > handle we changed that logic to recognize BLK_OPEN_RESTRICT_WRITES based > > > on whether the file was opened writable and writes to that block device > > > are blocked. That logic doesn't work because we do allow > > > BLK_OPEN_RESTRICT_WRITES to be specified without BLK_OPEN_WRITE. > > > > > > So fix the detection logic. Use O_EXCL as an indicator that > > > BLK_OPEN_RESTRICT_WRITES has been requested. We do the exact same thing > > > for pidfds where O_EXCL means that this is a pidfd that refers to a > > > thread. For userspace open paths O_EXCL will never be retained but for > > > internal opens where we open files that are never installed into a file > > > descriptor table this is fine. > > > > > > Note that BLK_OPEN_RESTRICT_WRITES is an internal only flag that cannot > > > directly be raised by userspace. It is implicitly raised during > > > mounting. > > > > > > Passes xftests and blktests with CONFIG_BLK_DEV_WRITE_MOUNTED set and > > > unset. > > > > > > Fixes: 321de651fa56 ("block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access") > > > Reported-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> > > > Link: https://lore.kernel.org/r/ZfyyEwu9Uq5Pgb94@xxxxxxxxxxxxxxxxxxxx > > > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> > > > > The fix looks correct but admittedly it looks a bit hacky. I'd prefer > > storing the needed information in some other flag, preferably one that does > > not already have a special meaning with block devices. But FMODE_ space is > > exhausted and don't see another easy solution. So I guess: > > Admittedly, it's not pretty but we're abusing O_EXCL already for block > devices. We do have FMODE_* available. We could add > FMODE_WRITE_RESTRICTED because we have two bits left. Yeah, I'm mostly afraid that a few years down the road someone comes and adds code thinking that file->f_flags & O_EXCL means this is exclusive bdev open. Which will be kind of natural assumption but subtly wrong... So to make the code more robust, I'd prefer burning a fmode flag for this rather than abusing O_EXCL. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR