On Sun, Oct 01 2017, Mikael Abrahamsson wrote: > On Mon, 18 Sep 2017, NeilBrown wrote: > >> Anyway, thanks for the example of a real problem related to this. It >> does make it easier to think about. > > Btw, if someone does --zero-superblock or dd /dev/zero to to a component > device that is active, what happens when mdadm --stop /dev/mdX is run? > Does it write out the complete superblock again? --zero-superblock won't work on a device that is currently part of an array. dd /dev/zero will. When the array is stopped the metadata will be written if the array is not read-only and is not clean. So for 'linear' and 'raid0' it is never written. For others it probably is but may not be. I'm not sure that forcing a write makes sense. A dd could corrupt lots of stuff, and just saving the metadata is not a big win. I've been playing with some code, and this patch makes it impossible to write to a device which is in-use by md. Well... not exactly. If a partition is in-use by md, the whole device can still be written to. But the partition itself cannot. Also if metadata is managed by user-space, writes are still allowed. To fix that, we would need to capture each write request and validate the sector range. Not impossible, but ugly. Also, by itself, this patch breaks the use of raid6check on an active array. We could fix that by enabling writes whenever a region is suspended. Still... maybe it is a starting point for thinking about the problem. NeilBrown diff --git a/drivers/md/md.c b/drivers/md/md.c index 0ff1bbf6c90e..7c469cd9febc 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -2264,6 +2264,7 @@ static int lock_rdev(struct md_rdev *rdev, dev_t dev, int shared) pr_warn("md: could not open %s.\n", __bdevname(dev, b)); return PTR_ERR(bdev); } + bdev->bd_holder_only_writes = !shared; rdev->bdev = bdev; return err; } @@ -2272,6 +2273,7 @@ static void unlock_rdev(struct md_rdev *rdev) { struct block_device *bdev = rdev->bdev; rdev->bdev = NULL; + bdev->bd_holder_only_writes = 0; blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL); } diff --git a/fs/block_dev.c b/fs/block_dev.c index 93d088ffc05c..673b71bac731 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1816,10 +1816,14 @@ void blkdev_put(struct block_device *bdev, fmode_t mode) WARN_ON_ONCE(--bdev->bd_contains->bd_holders < 0); /* bd_contains might point to self, check in a separate step */ - if ((bdev_free = !bdev->bd_holders)) + if ((bdev_free = !bdev->bd_holders)) { + bdev->bd_holder_only_writes = 0; bdev->bd_holder = NULL; - if (!bdev->bd_contains->bd_holders) + } + if (!bdev->bd_contains->bd_holders) { + bdev->bd_contains->bd_holder_only_writes = 0; bdev->bd_contains->bd_holder = NULL; + } spin_unlock(&bdev_lock); @@ -1884,8 +1888,13 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) loff_t size = i_size_read(bd_inode); struct blk_plug plug; ssize_t ret; + struct block_device *bdev = I_BDEV(bd_inode); - if (bdev_read_only(I_BDEV(bd_inode))) + if (bdev_read_only(bdev)) + return -EPERM; + if (bdev->bd_holder != NULL && + bdev->bd_holder_only_writes && + bdev->bd_holder != file) return -EPERM; if (!iov_iter_count(from)) diff --git a/include/linux/fs.h b/include/linux/fs.h index 339e73742e73..79e3a2822867 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -424,6 +424,7 @@ struct block_device { void * bd_holder; int bd_holders; bool bd_write_holder; + bool bd_holder_only_writes; #ifdef CONFIG_SYSFS struct list_head bd_holder_disks; #endif
Attachment:
signature.asc
Description: PGP signature