On Fri, 2021-12-03 at 13:06 -0500, Stefan Berger wrote: > On 12/3/21 12:03, James Bottomley wrote: > > On Thu, 2021-12-02 at 21:31 -0500, Stefan Berger wrote: > > [...] > > > static int securityfs_init_fs_context(struct fs_context *fc) > > > { > > > + int rc; > > > + > > > + if (fc->user_ns->ima_ns->late_fs_init) { > > > + rc = fc->user_ns->ima_ns->late_fs_init(fc->user_ns); > > > + if (rc) > > > + return rc; > > > + } > > > fc->ops = &securityfs_context_ops; > > > return 0; > > > } > > I know I suggested this, but to get this to work in general, it's > > going to have to not be specific to IMA, so it's going to have to > > become something generic like a notifier chain. The other problem > > is it's only working still by accident: > > I had thought about this also but the rationale was: > > securityfs is compiled due to CONFIG_IMA_NS and the user namespace > exists there and that has a pointer now to ima_namespace, which can > have that callback. I assumed that other namespaced subsystems could > also be reached then via such a callback, but I don't know. Well securityfs is supposed to exist for LSMs. At some point each of those is going to need to be namespaced, which may eventually be quite a pile of callbacks, which is why I thought of a notifier. > I suppose any late filesystem init callchain would have to be > connected to the user_namespace somehow? I don't think so; I think just moving some securityfs entries into the user_namespace and managing the notifier chain from within securityfs will do for now. [although I'd have to spec this out in code before I knew for sure]. > > > +int ima_fs_ns_init(struct ima_namespace *ns) > > > +{ > > > + ns->mount = securityfs_ns_create_mount(ns->user_ns); > > This actually triggers on the call to securityfs_init_fs_context, > > but nothing happens because the callback is null. Every subsequent > > use of fscontext will trigger this. The point of a keyed supeblock > > is that fill_super is only called once per key, that's the place we > > should be doing this. It should also probably be a blocking > > notifier so anyconsumer of securityfs can be namespaced by > > registering for this notifier. > > What I don't like about the fill_super is that it gets called too > early: > > [ 67.058611] securityfs_ns_create_mount @ 102 target user_ns: > ffff95c010698c80; nr_extents: 0 > [ 67.059836] securityfs_fill_super @ 47 user_ns: > ffff95c010698c80; > nr_extents: 0 Right, it's being activated by securityfs_ns_create_mount which is called as soon as the user_ns is created. > We are switching to the target user namespace in > securityfs_ns_create_mount. The expected nr_extents at this point is > 0, since user_ns hasn't been configured, yet. But then > security_fill_super is also called with nr_extents 0. We cannot use > that, it's too early! Exactly, so I was thinking of not having a securityfs_ns_create_mount at all. All the securityfs_ns_create.. calls would be in the notifier call chain. This means there's nothing to fill the superblock until an actual mount on it is called. > > > + if (IS_ERR(ns->mount)) { > > > + ns->mount = NULL; > > > + return -1; > > > + } > > > + ns->mount_count = 1; > > This is a bit nasty, too: we're spilling the guts of mount count > > tracking into IMA instead of encapsulating it inside securityfs. > > Ok, I can make this disappear. > > > > > + > > > + /* Adjust the trigger for user namespace's early teardown of > > > dependent > > > + * namespaces. Due to the filesystem there's an additional > > > reference > > > + * to the user namespace. > > > + */ > > > + ns->user_ns->refcount_teardown += 1; > > > + > > > + ns->late_fs_init = ima_fs_ns_late_init; > > > + > > > + return 0; > > > +} > > I think what should be happening is that we shouldn't so the > > simple_pin_fs, which creates the inodes, ahead of time; we should > > do it inside fill_super using a notifier, meaning it gets called > > once per > > fill_super would only work for the init_user_ns from what I can see. > > > > key, creates the root dentry then triggers the notifier which > > instantiates all the namespaced entries. We can still use > > simple_pin_fs for this because there's no locking across > > fill_super. > > This would mean fill_super would be called the first time the > > securityfs is mounted inside the namespace. > > I guess I would need to know how fill_super would work or how it > could be called late/delayed as well. So it would be called early in the init_user_ns by non-namespaced consumers of securityfs, like it is now. Namespaced consumers wouldn't call any securityfs_ns_create callbacks to create dentries until they were notified from the fill_super notifier, which would now only be triggered on first mount of securityfs inside the namespace. > > If we do it this way, we can now make securityfs have its own mount > > and mount_count inside the user namespace, which it uses internally > > to the securityfs code, thus avoiding exposing them to ima or any > > other namespaced consumer. > > > > I also think we now don't need the securityfs_ns_ duplicated > > functions because the callback via the notifier chain now ensures > > we can usethe namespace they were created in to distinguish between > > non namespaced and namespaced entries. > > Is there then no need to pass a separate vfsmount * in anymore? I don't think so no. It could be entirely managed internally to securityfs. > Where would the vfsmount pointer reside? For now it's in > ima_namespace, but it sounds like it should be in a more centralized > place? Should it also be connected to the user_namespace so we can > pick it up using get_user_ns()? exactly. I think struct user_namespace should have two elements gated by a #ifdef CONFIG_SECURITYFS which are the vfsmount and the mount_count for passing into simple_pin_fs. James