On 1/29/2019 1:44 PM, Eric W. Biederman wrote: > All, > > With the existing mount API it is possible to mount a filesystem > like: > > mount -t ext4 /dev/sda1 -o user_xattr /some/path > mount -t ext4 /dev/sda1 -o nouser_xattr /some/other/path > > And have both mount commands succeed and have two mounts of the same > filesystem. If the mounter is not attentive or the first mount is added > earlier it may not be immediately noticed that a mount option that is > needed for the correct operation or the security of the system is lost. > > We have seen this failure mode with both devpts and proc. So it is not > theoretical, and it has resulted in CVEs. > > In some cases the existing mount API (such as a conflict between ro and > rw) handles this by returning -EBUSY. So we may be able to correct this > in the existing mount API. But it is always very tricky to to get > adequate testing for a change like that to avoid regressions, so I am > proposing we change this in the new mount api. > > This has been brought up before and I have been told it is technically > infeasible to make this work. To counter that I have sat down and > implemented it. > > The basic idea is: > - get a handle to a filesystem > (passing enough options to uniquely identify the super block). > Also capture enough state in the file handle to let you know if > the file system has it's mount options changed between system calls. > (essentially this is just the fs code that calls sget) > > - If the super block has not been configured allow setting the file > systems options. > > - If the super block has already been configured require reading the > file systems mount options before setting/updating the file systems > mount options. > > To complement that I have functionality that: > - Allows reading a file systems current mount options. > - Allows reading the mount options that are needed to get a handle to > the filesystem. For most filesystems it is just the block device > name. For nfs is is essentially all mount options. For btrfs > it is the block device name, and the "devices=" mount option for > secondary block device names. Are you taking the LSM specific mount options into account? > > Please find below a tree where all of this is implemented and working. > Not all file systems have been converted but the most of the unconverted > ones are just a couple minutes work as I have converted all of the file > system mount helper routines. > > Also please find below an example mount program that takes the same set > of mount options as mount(8) today and mounts filesystems with the > proposed new mount api. > - Without having any filesystem mount knowledge it sucessfully figures > > out which system calls different mount options needs to be applied > to. > > - Without having any filesystem specific knowledge in most cases it > can detect if a mount option you specify is already specified to an > existing mount or not. For duplicates it can detect it ignores them. > For the other cases it fails the mount as it thinks the mount options > are different. > > - Which demonstrates it safe to put the detection and remediation of > multiple mount commands resolving to the same filesystem in user > space. > > I really don't care whose code gets used as long as it works. I do very > much care that we don't add a new mount api that has the confusion flaws > of the existing mount api. > > Along the way I have also detected a lot of room for improvement on the > mount path for filesystems. Those cleanup patches are in my tree below > and will be extracting them and sending them along shortly. > > Comments? > > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git no-losing-mount-options-proof-of-concept > > > > Eric >