On Mon, 8 Mar 2021 00:12:22 +0000 Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Sun, Mar 07, 2021 at 11:51:20PM +0000, Al Viro wrote: > > On Thu, Mar 04, 2021 at 01:11:33PM +0300, Alexander Mikhalitsyn wrote: > > > > > That problem connected with CRIU (Checkpoint-Restore in Userspace) project. > > > In CRIU we have support of autofs mounts C/R. To acheive that we need to use > > > ioctl's from /dev/autofs to get data about mounts, restore mount as catatonic > > > (if needed), change pipe fd and so on. But the problem is that during CRIU > > > dump we may meet situation when VFS subtree where autofs mount present was > > > overmounted as whole. > > > > > > Simpliest example is /proc/sys/fs/binfmt_misc. This mount present on most > > > GNU/Linux distributions by default. For instance on my Fedora 33: > > > > > > trigger automount of binfmt_misc > > > $ ls /proc/sys/fs/binfmt_misc > > > > > > $ cat /proc/1/mountinfo | grep binfmt > > > 35 24 0:36 / /proc/sys/fs/binfmt_misc rw,relatime shared:16 - autofs systemd-1 rw,...,direct,pipe_ino=223 > > > 632 35 0:56 / /proc/sys/fs/binfmt_misc rw,...,relatime shared:315 - binfmt_misc binfmt_misc rw > > > > > > $ sudo unshare -m -p --fork --mount-proc sh > > > # cat /proc/self/mountinfo | grep "/proc" > > > 828 809 0:23 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw > > > 829 828 0:36 / /proc/sys/fs/binfmt_misc rw,relatime - autofs systemd-1 rw,...,direct,pipe_ino=223 > > > 943 829 0:56 / /proc/sys/fs/binfmt_misc rw,...,relatime - binfmt_misc binfmt_misc rw > > > 949 828 0:57 / /proc rw...,relatime - proc proc rw > > > > > > As we can see now autofs mount /proc/sys/fs/binfmt_misc is inaccessible. > > > If we do something like: > > > > > > struct autofs_dev_ioctl *param; > > > param = malloc(...); > > > devfd = open("/dev/autofs", O_RDONLY); > > > init_autofs_dev_ioctl(param); > > > param->size = size; > > > strcpy(param->path, "/proc/sys/fs/binfmt_misc"); > > > param->openmount.devid = 36; > > > err = ioctl(devfd, AUTOFS_DEV_IOCTL_OPENMOUNT, param) > > > > > > now we get err = -ENOENT. > > > Where does that -ENOENT come from? AFAICS, pathwalk ought to succeed and > return you the root of overmounting binfmt_misc. Why doesn't the loop in > find_autofs_mount() locate anything it would accept? > Consider our mounts tree: > > > # cat /proc/self/mountinfo | grep "/proc" > > > 828 809 0:23 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw > > > 829 828 0:36 / /proc/sys/fs/binfmt_misc rw,relatime - autofs systemd-1 rw,...,direct,pipe_ino=223 > > > 943 829 0:56 / /proc/sys/fs/binfmt_misc rw,...,relatime - binfmt_misc binfmt_misc rw > > > 949 828 0:57 / /proc rw...,relatime - proc proc rw ENOENT comes from here (current kernel code): /* Find the topmost mount satisfying test() */ static int find_autofs_mount(const char *pathname, struct path *res, int test(const struct path *path, void *data), void *data) { struct path path; int err; err = kern_path(pathname, LOOKUP_MOUNTPOINT, &path); if (err) return err; <-------- here we successfuly open root dentry (/proc/sys/fs/binfmt_misc) of /proc (mnt_id = 949) err = -ENOENT; <---- set err and start search autofs mount /* here we use follow_up to move through upper dentries and find overmounted autofs. But in our case we opened dentry from /proc (mnt_id = 949) and this concrete dentry is *NOT* overmounted (whole /proc overmounted). */ while (path.dentry == path.mnt->mnt_root) { if (path.dentry->d_sb->s_magic == AUTOFS_SUPER_MAGIC) { if (test(&path, data)) { /* we never get there */ path_get(&path); *res = path; err = 0; break; } } if (!follow_up(&path)) break; } /* loop finished. err stays as it was err = -ENOENT */ path_put(&path); return err; } Source: https://github.com/torvalds/linux/blob/master/fs/autofs/dev-ioctl.c#L194 > I really dislike the patch - the whole "normalize path" thing is fundamentally > bogus, not to mention the iterator over all mounts, etc., so I would like to > understand what the hell is going on before even thinking of *not* NAKing > it on sight. I'm not trying to break current API or something similar. I'm just prepared RFC patch with my proposal. I'm ready to rework all of that to make it good. But without maintainers/community comments/suggestions it's unreal :) Please, explain what do you mean by "normalize path thing"? I'm not changing semantics of current ioctl() I've just trying to extend it to make it work in case when parent mount of autofs mount is overmounted. > > Wait, so you have /proc overmounted, without anything autofs-related on > /proc/sys/fs/binfmt_misc and still want to have the pathname resolved, just > because it would've resolved with that overmount of /proc removed? Something like that. 1. I don't expect that /proc/sys/fs/binfmt_misc path will be resolved (because, for instance we can overmount /proc by empty tmpfs and in this case after overmounting we can't even open /proc/sys/fs/binfmt_misc and that's okay). 2. We talking about autofs specific function which is used in several autofs-specific ioctls. One of that ioctl(AUTOFS_DEV_IOCTL_OPENMOUNT) which is designed to open overmounted autofs mounts. Because it's frequent case when autofs mount is overmounted (when we talk about direct mounts). This ioctl allows to open file desciptor of autofs root dentry and later, autofs daemon use it to manage mount (call another autofs ioctls on that fd). I've just meet problem, that this API not works when parent mount of autofs mount is overmounted. For example: tmpfs /some-dir autofs /some-dir/autofs1 <-autofs direct mount nfs /some-dir/autofs1 <-automounted fs on top of autofs ioctl(AUTOFS_DEV_IOCTL_OPENMOUNT) will work in this case. Because loop with follow_up() starts from /some-dir/autofs1 dentry of nfs, then follow_up() and move to /some-dir/autofs1 dentry of autofs. But if we change picture to: tmpfs1 /some-dir autofs /some-dir/autofs1 <-autofs direct mount nfs /some-dir/autofs1 <-automounted fs on top of autofs tmpfs2 /some-dir This will breaks API. Because know we can't even open /some-dir/autofs1 dentry. Ok. We can create this dentry at first by mkdir /some-dir/autofs1. But it will not help because our loop: while (path.dentry == path.mnt->mnt_root) { if (path.dentry->d_sb->s_magic == AUTOFS_SUPER_MAGIC) { if (test(&path, data)) { ... if (!follow_up(&path)) break; } will start from dentry /some-dir/autofs1 from tmpfs2. And after follow_up on that dentry we will move to / dentry => loop finishes => user get ENOENT. > > I hope I'm misreading you; in case I'm not, the ABI is extremely > tasteless and until you manage to document the exact semantics you want > for param->path, consider it NAKed. > > BTW, if that thing would be made to work, what's to stop somebody from > doing ...at() syscalls with the resulting fd as a starting point and pathnames > starting with ".."? "/foo is overmounted, but we can get to anything under > /foo/bar/ in the underlying tree since there's an autofs mount somewhere in > /foo/bar/splat/puke/*"? Interesting point. Thank you! I'm not sure. But is it serious problem for us? What stop somebody to open and hold fd to any directory before overmounting? > > IOW, the real question (aside of "WTF?") is what are you using the > resulting descriptor for and what do you need to be able to do with it. > Details, please. Sure. I've covered use cases of file descriptor returned by ioctl(AUTOFS_DEV_IOCTL_OPENMOUNT) above. Thanks for your reply! I'm sorry If my patch description is unclear. I'm newbie here :) Regards, Alex