On Tue, 9 Mar 2021 14:31:05 +0300 Alexander Mikhalitsyn <alexander.mikhalitsyn@xxxxxxxxxxxxx> wrote: > On Mon, 8 Mar 2021 00:12:22 +0000 > Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > On Sun, Mar 07, 2021 at 11:51:20PM +0000, Al Viro wrote: > > > On Thu, Mar 04, 2021 at 01:11:33PM +0300, Alexander Mikhalitsyn wrote: > > > > > > > That problem connected with CRIU (Checkpoint-Restore in Userspace) project. > > > > In CRIU we have support of autofs mounts C/R. To acheive that we need to use > > > > ioctl's from /dev/autofs to get data about mounts, restore mount as catatonic > > > > (if needed), change pipe fd and so on. But the problem is that during CRIU > > > > dump we may meet situation when VFS subtree where autofs mount present was > > > > overmounted as whole. > > > > > > > > Simpliest example is /proc/sys/fs/binfmt_misc. This mount present on most > > > > GNU/Linux distributions by default. For instance on my Fedora 33: > > > > > > > > trigger automount of binfmt_misc > > > > $ ls /proc/sys/fs/binfmt_misc > > > > > > > > $ cat /proc/1/mountinfo | grep binfmt > > > > 35 24 0:36 / /proc/sys/fs/binfmt_misc rw,relatime shared:16 - autofs systemd-1 rw,...,direct,pipe_ino=223 > > > > 632 35 0:56 / /proc/sys/fs/binfmt_misc rw,...,relatime shared:315 - binfmt_misc binfmt_misc rw > > > > > > > > $ sudo unshare -m -p --fork --mount-proc sh > > > > # cat /proc/self/mountinfo | grep "/proc" > > > > 828 809 0:23 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw > > > > 829 828 0:36 / /proc/sys/fs/binfmt_misc rw,relatime - autofs systemd-1 rw,...,direct,pipe_ino=223 > > > > 943 829 0:56 / /proc/sys/fs/binfmt_misc rw,...,relatime - binfmt_misc binfmt_misc rw > > > > 949 828 0:57 / /proc rw...,relatime - proc proc rw > > > > > > > > As we can see now autofs mount /proc/sys/fs/binfmt_misc is inaccessible. > > > > If we do something like: > > > > > > > > struct autofs_dev_ioctl *param; > > > > param = malloc(...); > > > > devfd = open("/dev/autofs", O_RDONLY); > > > > init_autofs_dev_ioctl(param); > > > > param->size = size; > > > > strcpy(param->path, "/proc/sys/fs/binfmt_misc"); > > > > param->openmount.devid = 36; > > > > err = ioctl(devfd, AUTOFS_DEV_IOCTL_OPENMOUNT, param) > > > > > > > > now we get err = -ENOENT. > > > > > Where does that -ENOENT come from? AFAICS, pathwalk ought to succeed and > > return you the root of overmounting binfmt_misc. Why doesn't the loop in > > find_autofs_mount() locate anything it would accept? > > > > Consider our mounts tree: > > > > # cat /proc/self/mountinfo | grep "/proc" > > > > 828 809 0:23 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw > > > > 829 828 0:36 / /proc/sys/fs/binfmt_misc rw,relatime - autofs systemd-1 rw,...,direct,pipe_ino=223 > > > > 943 829 0:56 / /proc/sys/fs/binfmt_misc rw,...,relatime - binfmt_misc binfmt_misc rw > > > > 949 828 0:57 / /proc rw...,relatime - proc proc rw > > ENOENT comes from here (current kernel code): > /* Find the topmost mount satisfying test() */ > static int find_autofs_mount(const char *pathname, > struct path *res, > int test(const struct path *path, void *data), > void *data) > { > struct path path; > int err; > > err = kern_path(pathname, LOOKUP_MOUNTPOINT, &path); > if (err) > return err; > <-------- here we successfuly open root dentry (/proc/sys/fs/binfmt_misc) of /proc (mnt_id = 949) > > err = -ENOENT; > <---- set err and start search autofs mount > /* > here we use follow_up to move through upper dentries and find overmounted autofs. > But in our case we opened dentry from /proc (mnt_id = 949) and this concrete dentry is *NOT* > overmounted (whole /proc overmounted). > */ > while (path.dentry == path.mnt->mnt_root) { > if (path.dentry->d_sb->s_magic == AUTOFS_SUPER_MAGIC) { > if (test(&path, data)) { > /* > we never get there > */ > path_get(&path); > *res = path; > err = 0; > break; > } > } > if (!follow_up(&path)) > break; > } > /* > loop finished. err stays as it was err = -ENOENT > */ > path_put(&path); > return err; > } > > Source: https://github.com/torvalds/linux/blob/master/fs/autofs/dev-ioctl.c#L194 > > > I really dislike the patch - the whole "normalize path" thing is fundamentally > > bogus, not to mention the iterator over all mounts, etc., so I would like to > > understand what the hell is going on before even thinking of *not* NAKing > > it on sight. > > I'm not trying to break current API or something similar. I'm just prepared > RFC patch with my proposal. I'm ready to rework all of that to make it good. > But without maintainers/community comments/suggestions it's unreal :) > > Please, explain what do you mean by "normalize path thing"? > I'm not changing semantics of current ioctl() I've just trying to extend it to make > it work in case when parent mount of autofs mount is overmounted. > > > > > Wait, so you have /proc overmounted, without anything autofs-related on > > /proc/sys/fs/binfmt_misc and still want to have the pathname resolved, just > > because it would've resolved with that overmount of /proc removed? > > Something like that. > > 1. I don't expect that /proc/sys/fs/binfmt_misc path will be resolved > (because, for instance we can overmount /proc by empty tmpfs and in this case after > overmounting we can't even open /proc/sys/fs/binfmt_misc and that's okay). > > 2. We talking about autofs specific function which is used in several autofs-specific > ioctls. One of that ioctl(AUTOFS_DEV_IOCTL_OPENMOUNT) which is designed to open > overmounted autofs mounts. Because it's frequent case when autofs mount is overmounted > (when we talk about direct mounts). This ioctl allows to open file desciptor > of autofs root dentry and later, autofs daemon use it to manage mount (call another autofs > ioctls on that fd). > > I've just meet problem, that this API not works when parent mount of autofs mount is overmounted. > For example: > tmpfs /some-dir > autofs /some-dir/autofs1 <-autofs direct mount > nfs /some-dir/autofs1 <-automounted fs on top of autofs > > ioctl(AUTOFS_DEV_IOCTL_OPENMOUNT) will work in this case. Because loop > with follow_up() starts from /some-dir/autofs1 dentry of nfs, then follow_up() > and move to /some-dir/autofs1 dentry of autofs. > > But if we change picture to: > tmpfs1 /some-dir > autofs /some-dir/autofs1 <-autofs direct mount > nfs /some-dir/autofs1 <-automounted fs on top of autofs > tmpfs2 /some-dir > > This will breaks API. Because know we can't even open /some-dir/autofs1 > dentry. > > Ok. We can create this dentry at first by mkdir /some-dir/autofs1. > But it will not help because our loop: > while (path.dentry == path.mnt->mnt_root) { > if (path.dentry->d_sb->s_magic == AUTOFS_SUPER_MAGIC) { > if (test(&path, data)) { > ... > if (!follow_up(&path)) > break; > } > will start from dentry /some-dir/autofs1 from tmpfs2. And after follow_up > on that dentry we will move to / dentry => loop finishes => user get ENOENT. > > > > > I hope I'm misreading you; in case I'm not, the ABI is extremely > > tasteless and until you manage to document the exact semantics you want > > for param->path, consider it NAKed. > > > > BTW, if that thing would be made to work, what's to stop somebody from > > doing ...at() syscalls with the resulting fd as a starting point and pathnames > > starting with ".."? "/foo is overmounted, but we can get to anything under > > /foo/bar/ in the underlying tree since there's an autofs mount somewhere in > > /foo/bar/splat/puke/*"? > > Interesting point. Thank you! > I'm not sure. But is it serious problem for us? What stop somebody to open > and hold fd to any directory before overmounting? > > > > > IOW, the real question (aside of "WTF?") is what are you using the > > resulting descriptor for and what do you need to be able to do with it. > > Details, please. > > Sure. I've covered use cases of file descriptor returned by ioctl(AUTOFS_DEV_IOCTL_OPENMOUNT) > above. > > Thanks for your reply! > I'm sorry If my patch description is unclear. I'm newbie here :) > > Regards, > Alex Gentle ping. Thanks, Alex