Re: [PATCH 1/2] block: handle BLK_OPEN_RESTRICT_WRITES correctly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 29, 2024 at 04:56:07AM +0000, Matthew Wilcox wrote:
> On Sat, Mar 23, 2024 at 05:11:19PM +0100, Christian Brauner wrote:
> > Last kernel release we introduce CONFIG_BLK_DEV_WRITE_MOUNTED. By
> > default this option is set. When it is set the long-standing behavior
> > of being able to write to mounted block devices is enabled.
> > 
> > But in order to guard against unintended corruption by writing to the
> > block device buffer cache CONFIG_BLK_DEV_WRITE_MOUNTED can be turned
> > off. In that case it isn't possible to write to mounted block devices
> > anymore.
> > 
> > A filesystem may open its block devices with BLK_OPEN_RESTRICT_WRITES
> > which disallows concurrent BLK_OPEN_WRITE access. When we still had the
> > bdev handle around we could recognize BLK_OPEN_RESTRICT_WRITES because
> > the mode was passed around. Since we managed to get rid of the bdev
> > handle we changed that logic to recognize BLK_OPEN_RESTRICT_WRITES based
> > on whether the file was opened writable and writes to that block device
> > are blocked. That logic doesn't work because we do allow
> > BLK_OPEN_RESTRICT_WRITES to be specified without BLK_OPEN_WRITE.
> > 
> > So fix the detection logic. Use O_EXCL as an indicator that
> > BLK_OPEN_RESTRICT_WRITES has been requested. We do the exact same thing
> > for pidfds where O_EXCL means that this is a pidfd that refers to a
> > thread. For userspace open paths O_EXCL will never be retained but for
> > internal opens where we open files that are never installed into a file
> > descriptor table this is fine.
> > 
> > Note that BLK_OPEN_RESTRICT_WRITES is an internal only flag that cannot
> > directly be raised by userspace. It is implicitly raised during
> > mounting.
> > 
> > Passes xftests and blktests with CONFIG_BLK_DEV_WRITE_MOUNTED set and
> > unset.
> > 
> > Fixes: 321de651fa56 ("block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access")
> > Reported-by: Matthew Wilcox <willy@xxxxxxxxxxxxx>
> > Link: https://lore.kernel.org/r/ZfyyEwu9Uq5Pgb94@xxxxxxxxxxxxxxxxxxxx
> > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx>
> 
> So v1 of this patch works fine.  I just got round to testing v2, and it
> does not.  Indeed, applying 2/2 causes root to fail to mount:
> 
> /dev/root: Can't open blockdev
> List of all bdev filesystems:
>  ext3
>  ext2
>  ext4
>  xfs
> 
> Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(254,0)
> 
> Applying only 1/2 boots but fails to fix the bug.

Thanks for testing this. This is odd because I tested with the setup you
provided.

I used the kernel config you sent to me in [2] with an xfs root device
with direct kernel boot and the following xfstests config in [3]. I'm
booting the vm with:

qemu-system-x86_64 -machine type=q35 -smp 1 -m 4G -accel kvm -cpu max -nographic -nodefaults \
        -chardev stdio,mux=on,id=console,signal=off -serial chardev:console -mon console \
        -kernel /home/ubuntu/data/mkosi-kernel2/mkosi.output.debian/image.vmlinuz \
        -drive file=/home/ubuntu/data/mkosi-kernel2/mkosi.output.debian/image.raw,format=raw,if=virtio \
        -append "console=ttyS0 root=/dev/vda2 module_blacklist=vmw_vmci systemd.tty.term.ttyS0=screen-256color systemd.tty.columns.ttyS0=96 systemd.tty.rows.ttyS0=46 debug loglevel=4 SYSTEMD_"

Note that the config you gave me in [2] didn't include
CONFIG_SCSI_VIRTIO=y which means I got the splat you did. I added this
missing config option and everything worked fine for me.

Can you please test what's in the vfs.fixes branch on
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git so we're
sure that we're testing the same thing?

The failures that I see are:

Failures: generic/042 generic/645 generic/682 generic/689 xfs/014
xfs/017 xfs/049 xfs/129 xfs/176 xfs/206 xfs/216 xfs/234 xfs/250 xfs/289
xfs/558 xfs/559
Failed 16 of 930 tests

* generic/645 fails because it requires an unrelated fix to fstests
  because we changed idmapped mounts to not not allow empty idmappings.
* generic/689 fails because the providec config doesn't compile tmpfs with POSIX ACL support
* xfs/558 and xfs/559 complain about missing logging
  about iomap validation and are unrelated
* All other failures are caused by loop devices which is expected unil
  a util-linux is released that contains Jan's fix in [1] so that
  mount(8) doesn't hold a writable fd to the loop device anymore and
  instead simply uses a read-only one.

[1]: https://github.com/util-linux/util-linux/commit/1cde32f323e0970f6c7f35940dcc0aea97b821e5
[2]: https://lore.kernel.org/r/Zf18I2UOGQxeN-Z1@xxxxxxxxxxxxxxxxxxxx
[3]:
#! /bin/bash

set -x

cd ~/src/git/xfstests-dev/
FIRST_DEV=/dev/vda3
SECOND_DEV=/dev/vda4
THIRD_DEV=/dev/vda5

echo "Testing xfs"
cat <<EOF >local.config
FSTYP=xfs
export TEST_DEV=${FIRST_DEV}
export SCRATCH_DEV=${SECOND_DEV}
export LOGWRITE_DEV=${THIRD_DEV}
export TEST_DIR=/mnt/test
export SCRATCH_MNT=/mnt/scratch
EOF

sudo mkfs.xfs -f ${FIRST_DEV}
sudo mkfs.xfs -f ${SECOND_DEV}
sudo mkfs.xfs -f ${THIRD_DEV}
sudo ./check -g quick




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux