On Tue, Mar 26, 2024 at 01:57:15PM +0100, Jan Kara wrote: > On Sat 23-03-24 17:11:19, Christian Brauner wrote: > > Last kernel release we introduce CONFIG_BLK_DEV_WRITE_MOUNTED. By > > default this option is set. When it is set the long-standing behavior > > of being able to write to mounted block devices is enabled. > > > > But in order to guard against unintended corruption by writing to the > > block device buffer cache CONFIG_BLK_DEV_WRITE_MOUNTED can be turned > > off. In that case it isn't possible to write to mounted block devices > > anymore. > > > > A filesystem may open its block devices with BLK_OPEN_RESTRICT_WRITES > > which disallows concurrent BLK_OPEN_WRITE access. When we still had the > > bdev handle around we could recognize BLK_OPEN_RESTRICT_WRITES because > > the mode was passed around. Since we managed to get rid of the bdev > > handle we changed that logic to recognize BLK_OPEN_RESTRICT_WRITES based > > on whether the file was opened writable and writes to that block device > > are blocked. That logic doesn't work because we do allow > > BLK_OPEN_RESTRICT_WRITES to be specified without BLK_OPEN_WRITE. > > > > So fix the detection logic. Use O_EXCL as an indicator that > > BLK_OPEN_RESTRICT_WRITES has been requested. We do the exact same thing > > for pidfds where O_EXCL means that this is a pidfd that refers to a > > thread. For userspace open paths O_EXCL will never be retained but for > > internal opens where we open files that are never installed into a file > > descriptor table this is fine. > > > > Note that BLK_OPEN_RESTRICT_WRITES is an internal only flag that cannot > > directly be raised by userspace. It is implicitly raised during > > mounting. > > > > Passes xftests and blktests with CONFIG_BLK_DEV_WRITE_MOUNTED set and > > unset. > > > > Fixes: 321de651fa56 ("block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access") > > Reported-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> > > Link: https://lore.kernel.org/r/ZfyyEwu9Uq5Pgb94@xxxxxxxxxxxxxxxxxxxx > > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> > > The fix looks correct but admittedly it looks a bit hacky. I'd prefer > storing the needed information in some other flag, preferably one that does > not already have a special meaning with block devices. But FMODE_ space is > exhausted and don't see another easy solution. So I guess: Admittedly, it's not pretty but we're abusing O_EXCL already for block devices. We do have FMODE_* available. We could add FMODE_WRITE_RESTRICTED because we have two bits left. One other solution apart from FMODE_* I had come up with was even nastier which would've been using the struct fd approach of using the two available bits in the pointer. But that doesn't work because we have stuff like dm that passes in strings which are byte aligned. And it's arguably uglier.