[extending the Cc a bit] On Sun, Jan 05, 2020 at 10:02:08AM -0800, James Bottomley wrote: > On Sun, 2020-01-05 at 17:23 +0100, Christian Brauner wrote: > > On Sat, Jan 04, 2020 at 12:14:26PM -0800, James Bottomley wrote: > > > fsconfig is a very powerful configuration mechanism except that it > > > only works for filesystems with superblocks. This patch series > > > generalises the useful concept of a multiple step configurational > > > mechanism carried by a file descriptor. The object of this patch > > > series is to get bind mounts to be configurable in the same way > > > that superblock based ones are, but it should have utility beyond > > > the filesytem realm. Patch 4 also reimplements fsconfig in terms > > > of configfd, but that's not a strictly necessary patch, it is > > > merely a useful demonstration that configfd is a superset of the > > > properties of fsconfig. > > > > Thanks for the patch. I'm glad fsconfig() is picked back up; either > > by you or by David. We will need this for sure. > > But the configfd approach does not strike me as a great idea. > > Anonymous inode fds provide an abstraction mechanism for kernel > > objects which we built around fds such as timerfd, pidfd, mountfd and > > so on. When you stat an anonfd you get ANON_INODE_FS_MAGIC and you > > get the actual type by looking at fdinfo, or - more common - by > > parsing out /proc/<pid>/fd/<nr> and discovering "[fscontext]". So > > it's already a pretty massive abstraction layer we have. But configfd > > would be yet another fd abstraction based on anonfds. > > The idea has been that a new fd type based on anonfds comes with an > > api specific to that type of fd. That seems way nicer from an api > > design perspective than implementing new apis as part of yet another > > generic configfd layer. > > Really, it's just a fd that gathers config information and can reserve > specific errors (and we should really work out the i18n implications of It's rather a complex multiplexer intended to go beyond the realm of filesystems/mount api and that's something we have been burned by before. > the latter). Whether it's a new fd type or an anonfd with a specific > name doesn't seem to be that significant, so the name could be set by > the type. > > > Another problem is that these syscalls here would be massive > > multiplexing syscalls. If they are ever going to be used outside of > > filesystem use-cases (which is doubtful) they will quickly rival > > prctl(), seccomp(), and ptrace(). > > Actually, that's partly the point. We do have several systemcalls with Actually I think that's the problem. The keyctl api itself suffers from the problem that it already has a complex multiplexer. That could either point to bad api design (sorry, David :)) or it's just a very complex use-case like the mount api. The good thing is that it's restricted to a single domain: keys. And that's good. Plumbing both e.g. keys and (parts of) mounts on top of another generic api is what strikes me as a bad idea. > variable argument parsing that would benefit from an approach like > this. keyctl springs immediately to mind. > > > That's not a great thing. Especially, since we recently (a few > > months ago with Linus chiming in too) had long discussions with the > > conclusion that multiplexing syscalls are discouraged, from a > > security and api design perspective. Especially when they are not > > tied to a specific API (e.g. seccomp() and bpf() are at least tied to > > a specific API). libcs such as glibc and musl had reservations in > > that regard as well. > > > > This would also spread the mount api across even more fd types than > > it already does now which is cumbersome for userspace. > > > > A generic API like that also makes it hard to do interception in > > userspace which is important for brokers such as e.g. used in Firefox > > or what we do in various container use-cases. > > > > So I have strong reservations about configfd and would strongly favor > > the revival of the original fsconfig() patchset. > > Ah well, I did have plans for configfd to be self describing, so the > arguments accepted by each type would be typed and pre-registered and > thus parseable generically, so instead of being the usual anonymous > multiplex sink, it would at least be an introspectable multiplexed > sink. The problem there was I can't make fsconfig fit into that We already have fsconfig() to configure mounts so it seems odd to now spread the mount api onto configfd imho. Christian