Re: [PATCH] block: Add config option to not allow writing to mounted devices

Jan Kara <jack@xxxxxxx> · Wed, 14 Jun 2023 12:12:56 +0200

On Wed 14-06-23 00:10:26, Christoph Hellwig wrote:
> On Tue, Jun 13, 2023 at 01:34:48PM +0200, Jan Kara wrote:
> > > It's not just syzbot here; at least once in my life I accidentally did
> > > `dd if=/path/to/foo.iso of=/dev/sda` when `/dev/sda` was my booted disk
> > > and not the target USB device.  I know I'm not alone =)
> > 
> > Yeah, so I'm not sure we are going to protect against this particular case.
> > I mean it is not *that* uncommon to alter partition table of /dev/sda while
> > /dev/sda1 is mounted. And for the kernel it is difficult to distinguish
> > this and your mishap.
> 
> I think it is actually very easy to distinguish, because the partition
> table is not mapped to any partition and certainly not an exclusively
> opened one.

Well, OK, I have not been precise :). Modifying a partition table (or LVM
description block) is impossible to distinguish from clobbering a
filesystem on open(2) time. Once we decide we implement arbitration of each
individual write(2), we can obviously stop writes to area covered by some
exclusively open partition. But then you are getting at the complexity
level of tracking used ranges of block devices which Darrick has suggested
and you didn't seem to like that (and neither do I). Furthermore the
protection is never going to be perfect as soon as loopback devices, device
mapper, and similar come into the mix (or it gets really really complex).
So I'd really prefer to stick with whatever arbitration we can perform on
open(2).

> > 1) If user can write some image and make kernel mount it.
> > 2) If user can modify device content while mounted (but not buffer cache
> > of the device).
> > 3) If user can modify buffer cache of the device while mounted.
> > 
> > 3) is the most problematic and effectively equivalent to full machine
> > control (executing arbitrary code in kernel mode) these days.
> 
> If a corrupted image can trigger arbitrary code execution that also
> means the file system code does not do proper input validation.

I agree. But case 3) is not about corrupted image - it is about userspace's
ability to corrupt data stored in the buffer cache *after* it has been
loaded from the image and verified. This is not a problem for XFS which has
its private block device cache incoherent with the buffer cache you access
when opening the bdev but basically every other filesystem suffers from
this problem.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR