On Tue, Mar 03, 2020 at 10:26:21AM +0100, Miklos Szeredi wrote: > On Tue, Mar 3, 2020 at 10:13 AM David Howells <dhowells@xxxxxxxxxx> wrote: > > > > Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > > I'm doing a patch. Let's see how it fares in the face of all these > > > preconceptions. > > > > Don't forget the efficiency criterion. One reason for going with fsinfo(2) is > > that scanning /proc/mounts when there are a lot of mounts in the system is > > slow (not to mention the global lock that is held during the read). > > > > Now, going with sysfs files on top of procfs links might avoid the global > > lock, and you can avoid rereading the options string if you export a change > > notification, but you're going to end up injecting a whole lot of pathwalk > > latency into the system. > > Completely irrelevant. Cached lookup is so much optimized, that you > won't be able to see any of it. > > No, I don't think this is going to be a performance issue at all, but > if anything we could introduce a syscall > > ssize_t readfile(int dfd, const char *path, char *buf, size_t > bufsize, int flags); > > that is basically the equivalent of open + read + close, or even a > vectored variant that reads multiple files. But that's off topic > again, since I don't think there's going to be any performance issue > even with plain I/O syscalls. > > > > > On top of that, it isn't going to help with the case that I'm working towards > > implementing where a container manager can monitor for mounts taking place > > inside the container and supervise them. What I'm proposing is that during > > the action phase (eg. FSCONFIG_CMD_CREATE), fsconfig() would hand an fd > > referring to the context under construction to the manager, which would then > > be able to call fsinfo() to query it and fsconfig() to adjust it, reject it or > > permit it. Something like: > > > > fd = receive_context_to_supervise(); > > struct fsinfo_params params = { > > .flags = FSINFO_FLAGS_QUERY_FSCONTEXT, > > .request = FSINFO_ATTR_SB_OPTIONS, > > }; > > fsinfo(fd, NULL, ¶ms, sizeof(params), buffer, sizeof(buffer)); > > supervise_parameters(buffer); > > fsconfig(fd, FSCONFIG_SET_FLAG, "hard", NULL, 0); > > fsconfig(fd, FSCONFIG_SET_STRING, "vers", "4.2", 0); > > fsconfig(fd, FSCONFIG_CMD_SUPERVISE_CREATE, NULL, NULL, 0); > > struct fsinfo_params params = { > > .flags = FSINFO_FLAGS_QUERY_FSCONTEXT, > > .request = FSINFO_ATTR_SB_NOTIFICATIONS, > > }; > > struct fsinfo_sb_notifications sbnotify; > > fsinfo(fd, NULL, ¶ms, sizeof(params), &sbnotify, sizeof(sbnotify)); > > watch_super(fd, "", AT_EMPTY_PATH, watch_fd, 0x03); > > fsconfig(fd, FSCONFIG_CMD_SUPERVISE_PERMIT, NULL, NULL, 0); > > close(fd); > > > > However, the supervised mount may be happening in a completely different set > > of namespaces, in which case the supervisor presumably wouldn't be able to see > > the links in procfs and the relevant portions of sysfs. > > It would be a "jump" link to the otherwise invisible directory. More magic links to beam you around sounds like a bad idea. We had a bunch of CVEs around them in containers and they were one of the major reasons behind us pushing for openat2(). That's why it has a RESOLVE_NO_MAGICLINKS flag. Christian