On 12/3/21 12:03, James Bottomley wrote:
On Thu, 2021-12-02 at 21:31 -0500, Stefan Berger wrote:
[...]
static int securityfs_init_fs_context(struct fs_context *fc)
{
+ int rc;
+
+ if (fc->user_ns->ima_ns->late_fs_init) {
+ rc = fc->user_ns->ima_ns->late_fs_init(fc->user_ns);
+ if (rc)
+ return rc;
+ }
fc->ops = &securityfs_context_ops;
return 0;
}
I know I suggested this, but to get this to work in general, it's going
to have to not be specific to IMA, so it's going to have to become
something generic like a notifier chain. The other problem is it's
only working still by accident:
I had thought about this also but the rationale was:
securityfs is compiled due to CONFIG_IMA_NS and the user namespace
exists there and that has a pointer now to ima_namespace, which can have
that callback. I assumed that other namespaced subsystems could also be
reached then via such a callback, but I don't know.
I suppose any late filesystem init callchain would have to be connected
to the user_namespace somehow?
+int ima_fs_ns_init(struct ima_namespace *ns)
+{
+ ns->mount = securityfs_ns_create_mount(ns->user_ns);
This actually triggers on the call to securityfs_init_fs_context, but
nothing happens because the callback is null. Every subsequent use of
fscontext will trigger this. The point of a keyed supeblock is that
fill_super is only called once per key, that's the place we should be
doing this. It should also probably be a blocking notifier so any
consumer of securityfs can be namespaced by registering for this
notifier.
What I don't like about the fill_super is that it gets called too early:
[ 67.058611] securityfs_ns_create_mount @ 102 target user_ns:
ffff95c010698c80; nr_extents: 0
[ 67.059836] securityfs_fill_super @ 47 user_ns: ffff95c010698c80;
nr_extents: 0
We are switching to the target user namespace in
securityfs_ns_create_mount. The expected nr_extents at this point is 0,
since user_ns hasn't been configured, yet. But then security_fill_super
is also called with nr_extents 0. We cannot use that, it's too early!
+ if (IS_ERR(ns->mount)) {
+ ns->mount = NULL;
+ return -1;
+ }
+ ns->mount_count = 1;
This is a bit nasty, too: we're spilling the guts of mount count
tracking into IMA instead of encapsulating it inside securityfs.
Ok, I can make this disappear.
+
+ /* Adjust the trigger for user namespace's early teardown of
dependent
+ * namespaces. Due to the filesystem there's an additional
reference
+ * to the user namespace.
+ */
+ ns->user_ns->refcount_teardown += 1;
+
+ ns->late_fs_init = ima_fs_ns_late_init;
+
+ return 0;
+}
I think what should be happening is that we shouldn't so the
simple_pin_fs, which creates the inodes, ahead of time; we should do it
inside fill_super using a notifier, meaning it gets called once per
fill_super would only work for the init_user_ns from what I can see.
key, creates the root dentry then triggers the notifier which
instantiates all the namespaced entries. We can still use
simple_pin_fs for this because there's no locking across fill_super.
This would mean fill_super would be called the first time the
securityfs is mounted inside the namespace.
I guess I would need to know how fill_super would work or how it could
be called late/delayed as well.
If we do it this way, we can now make securityfs have its own mount and
mount_count inside the user namespace, which it uses internally to the
securityfs code, thus avoiding exposing them to ima or any other
namespaced consumer.
I also think we now don't need the securityfs_ns_ duplicated functions
because the callback via the notifier chain now ensures we can use the
namespace they were created in to distinguish between non namespaced
and namespaced entries.
Is there then no need to pass a separate vfsmount * in anymore? Where
would the vfsmount pointer reside? For now it's in ima_namespace, but it
sounds like it should be in a more centralized place? Should it also be
connected to the user_namespace so we can pick it up using get_user_ns()?
So non-namespaced consumers of securityfs would do what they do now
(calling the securityfs_create on initialization) and namespaced
consumers would register a callback on the notifier which would get
called once for every namespace the securityfs gets mounted in.
I also theorize if we do it with notifiers, we could have a notifier on
kill_sb to tear down all the entires. If we do this, I think we don't
have to pin any more.
James
diff --git a/security/inode.c b/security/inode.c
index ed5f1c533776..49c9839642ed 100644
--- a/security/inode.c
+++ b/security/inode.c
@@ -44,6 +44,8 @@ static int securityfs_fill_super(struct super_block
*sb, struct fs_context *fc)
static const struct tree_descr files[] = {{""}};
int error;
+ printk(KERN_INFO "%s @ %u user_ns: %px; nr_extents: %d\n",
__func__, __LINE__, fc->user_ns, fc->user_ns->uid_map.nr_extents);
+
error = simple_fill_super(sb, SECURITYFS_MAGIC, files);
if (error)
return error;
@@ -97,6 +99,8 @@ struct vfsmount *securityfs_ns_create_mount(struct
user_namespace *user_ns)
put_user_ns(fc->user_ns);
fc->user_ns = get_user_ns(user_ns);
+ printk(KERN_INFO "%s @ %u target user_ns: %px; nr_extents:
%d\n", __func__, __LINE__, fc->user_ns, fc->user_ns->uid_map.nr_extents);
+
mnt = fc_mount(fc);
put_fs_context(fc);
return mnt;