Re: [PATCH v2 0/6] introduce configfd as generalisation of fsconfig

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Sun, 05 Jan 2020 10:02:08 -0800

On Sun, 2020-01-05 at 17:23 +0100, Christian Brauner wrote:
> On Sat, Jan 04, 2020 at 12:14:26PM -0800, James Bottomley wrote:
> > fsconfig is a very powerful configuration mechanism except that it
> > only works for filesystems with superblocks.  This patch series
> > generalises the useful concept of a multiple step configurational
> > mechanism carried by a file descriptor.  The object of this patch
> > series is to get bind mounts to be configurable in the same way
> > that superblock based ones are, but it should have utility beyond
> > the filesytem realm.  Patch 4 also reimplements fsconfig in terms
> > of configfd, but that's not a strictly necessary patch, it is
> > merely a useful demonstration that configfd is a superset of the
> > properties of fsconfig.
> 
> Thanks for the patch. I'm glad fsconfig() is picked back up; either
> by you or by David. We will need this for sure.
> But the configfd approach does not strike me as a great idea.
> Anonymous inode fds provide an abstraction mechanism for kernel
> objects which we built around fds such as timerfd, pidfd, mountfd and
> so on. When you stat an anonfd you get ANON_INODE_FS_MAGIC and you
> get the actual type by looking at fdinfo, or - more common - by
> parsing out /proc/<pid>/fd/<nr> and discovering "[fscontext]". So
> it's already a pretty massive abstraction layer we have. But configfd
> would be yet another fd abstraction based on anonfds.
> The idea has been that a new fd type based on anonfds comes with an
> api specific to that type of fd. That seems way nicer from an api
> design perspective than implementing new apis as part of yet another
> generic configfd layer.

Really, it's just a fd that gathers config information and can reserve
specific errors (and we should really work out the i18n implications of
the latter).  Whether it's a new fd type or an anonfd with a specific
name doesn't seem to be that significant, so the name could be set by
the type.

> Another problem is that these syscalls here would be massive
> multiplexing syscalls. If they are ever going to be used outside of
> filesystem use-cases (which is doubtful) they will quickly rival
> prctl(), seccomp(), and ptrace().

Actually, that's partly the point.  We do have several systemcalls with
variable argument parsing that would benefit from an approach like
this.  keyctl springs immediately to mind.

>  That's not a great thing. Especially, since we recently (a few
> months ago with Linus chiming in too) had long discussions with the
> conclusion that multiplexing syscalls are discouraged, from a
> security and api design perspective. Especially when they are not
> tied to a specific API (e.g. seccomp() and bpf() are at least tied to
> a specific API). libcs such as glibc and musl had reservations in
> that regard as well.
> 
> This would also spread the mount api across even more fd types than
> it already does now which is cumbersome for userspace.
> 
> A generic API like that also makes it hard to do interception in
> userspace which is important for brokers such as e.g. used in Firefox
> or what we do in various container use-cases.
> 
> So I have strong reservations about configfd and would strongly favor
> the revival of the original fsconfig() patchset.

Ah well, I did have plans for configfd to be self describing, so the
arguments accepted by each type would be typed and pre-registered and
thus parseable generically, so instead of being the usual anonymous
multiplex sink, it would at least be an introspectable multiplexed
sink.  The problem there was I can't make fsconfig fit into that
framework but, as I said, it was only done to demo that configfd was a
superset, I'm not wedded to that part.

James