On Mon, Mar 11, 2024 at 10:01:33AM +0100, Günther Noack wrote: > On Mon, Mar 11, 2024 at 12:03:13PM +1100, Dave Chinner wrote: > > On Fri, Mar 08, 2024 at 12:03:01PM +0100, Günther Noack wrote: > > > On Fri, Mar 08, 2024 at 08:02:13AM +0100, Arnd Bergmann wrote: > > > > On Fri, Mar 8, 2024, at 00:09, Dave Chinner wrote: > > > > > I have no idea what a "safe" ioctl means here. Subsystems already > > > > > restrict ioctls that can do damage if misused to CAP_SYS_ADMIN, so > > > > > "safe" clearly means something different here. > > > > > > > > That was my problem with the first version as well, but I think > > > > drawing the line between "implemented in fs/ioctl.c" and > > > > "implemented in a random device driver fops->unlock_ioctl()" > > > > seems like a more helpful definition. > > > > > > Yes, sorry for the confusion - that is exactly what I meant to say with "safe".: > > > > > > Those are the IOCTL commands implemented in fs/ioctl.c which do not go through > > > f_ops->unlocked_ioctl (or the compat equivalent). > > > > Which means all the ioctls we wrequire for to manage filesystems are > > going to be considered "unsafe" and barred, yes? > > > > That means you'll break basic commands like 'xfs_info' that tell you > > the configuration of the filesystem. It will prevent things like > > online growing and shrinking, online defrag, fstrim, online > > scrubbing and repair, etc will not worki anymore. It will break > > backup utilities like xfsdump, and break -all- the device management > > of btrfs and bcachefs filesystems. > > > > Further, all the setup and management of -VFS functionality- like > > fsverity and fscrypt is actually done at the filesystem level (i.e > > through ->unlocked_ioctl, no do_vfs_ioctl()) so those are all going > > to get broken as well despite them being "vfs features". > > > > Hence from a filesystem perspective, this is a fundamentally > > unworkable definition of "safe". > > As discussed further up in this thread[1], we want to only apply the IOCTL > command filtering to block and character devices. I think this should resolve > your concerns about file system specific IOCTLs? This is implemented in patch > V10 going forward[2]. I think you misunderstand. I used filesystem ioctls as an obvious counter argument to this "VFS-only ioctls are safe" proposal to show that it fundamentally breaks core filesystem boot and management interfaces. Operations to prepare filesystems for mount may require block device ioctls to be run. i.e. block device ioctls are required core boot and management interfaces. Disallowing ioctls on block devices will break udev rules that set up block devices on kernel device instantiation events. It will break partitioning tools that need to read/modify/rescan the partition table. This will prevent discard, block zeroing and *secure erase* operations. It may prevent libblkid from reporting optimal device IO parameters to filesystem utilities like mkfs. You won't be able to mark block devices as read only. Management of zoned block devices will be impossible. Then stuff like DM and MD devices (e.g. LVM, RAID, etc) simply won't appear on the system because they can't be scanned, configured, assembled, etc. And so on. The fundamental fact is that system critical block device ioctls are implemented by generic infrastructure below the VFS layer. They have their own generic ioctl layer - blkdev_ioctl() is equivalent of do_vfs_ioctl() for the block layer. But if we cut off everything below ->unlocked_ioctl() at the VFS, then we simply can't run any of these generic block device ioctls. As I said: this proposal is fundamentally unworkable without extensive white- and black-listing of individual ioctls in the security policies. That's not really a viable situation, because we're going to change code and hence likely silently break those security policy lists regularly.... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx