On Mon, Mar 13, 2023 at 03:53:57PM +0100, Dmitry Vyukov wrote: > > Long-term we are moving ext4 in a direction where we can disallow block > > device modifications while the fs is mounted but we are not there yet. I've > > discussed some shorter-term solution to avoid such known problems with syzbot > > developers and what seems plausible would be a kconfig option to disallow > > writing to a block device when it is exclusively open by someone else. > > But so far I didn't get to trying whether this would reasonably work. Would > > you be interested in having a look into this? > > Does this affect only the loop device or also USB storage devices? > Say, if the USB device returns different contents during mount and on > subsequent reads? Modifying the block device while the file system is mounted is something that we have to allow for now because tune2fs uses it to modify the superblock. It has historically also been used (rarely) by people who know what they are doing to do surgery on a mounted file system. If we create a way for tune2fs to be able to update the superblock via some kind of ioctl, we could disallow modifying the block device while the file system is mounted. Of course, it would require waiting at least 5-6 years since sometimes people will update the kernel without updating userspace. We'd also need to check to make sure there aren't boot loader installer (such as grub-install) that depend on being able to modify the block device while the root file system is mounted, at least in some rare cases. The "how" to exclude mounted file systems is relatively easy. The kernel already knows when the file system is mounted, and it is already a supported feature that a userspace application that wants to be careful can open a block device with O_EXCL, and if it is in use by the kernel --- mounted by a file system, being used by dm-thin, et. al -- the open(2) system call will fail. From the open(2) man page. In general, the behavior of O_EXCL is undefined if it is used without O_CREAT. There is one exception: on Linux 2.6 and later, O_EXCL can be used without O_CREAT if pathname refers to a block device. If the block device is in use by the system (e.g., mounted), open() fails with the error EBUSY. Something which the syzbot could to do today is to simply use O_EXCL whenever trying to open a block device. This would avoid a class of syzbot false positives, since normally it requires root privileges and/or an experienced sysadmin to try to modify a block device while it is mounted and/or in use by LVM. - Ted P.S. Trivia note: Aproximately month after I started work at VA Linux Systems, a sysadmin intern which was given the root password to sourceforge.net, while trying to fix a disk-to-disk backup, ran mkfs.ext3 on /dev/hdXX, which was also being used as one-half of a RAID 0 setup on which open source code critical to the community (including, for example, OpenGL) was mounted and serving. The intern got about 50% the way through zeroing the inode table on /dev/hdXX before the file system noticed and threw an error, at which point wiser heads stopped what the intern was doing and tried to clean up the mess. Of course, there were no backups, since that was what the intern was trying to fix! There are a couple of things that we could learn from this incident. One was that giving the root password to an untrained intern not familiar with the setup on the serving system was... an unfortunate choice. Another was that adding the above-mentioned O_EXCL feature and teaching mkfs to use it was an obvious post-mortem action item to prevent this kind of problem in the future...