Hi !
I have been looking at the SecurityFS code (security/inode.c) since I
am currently trying to figure out how to create a version of SecurityFS
for IMA namespaces support and maybe you have some guidance on this.
Having a namespace-supporting (multi-instance) version/derivative of
SecurityFS seems to be key for what we want to do for namespacing IMA.
Also, I was trying to get a discussion started here that may give you
some more details about what we are trying to do:
https://github.com/opencontainers/runc/issues/3288
I have looked through some of the core code of the Linux filesystem
subsystem a bit and it seems like what we would need for our intentions
is some sort of filesystem that is using the function get_tree_keyed()
where the key is an identifier for the namespace, maybe the IMA/user
namespace pointer itself. [
https://elixir.bootlin.com/linux/latest/source/fs/super.c#L1190 ] This
would presumably allow us to create an instance of this new filesystem
per IMA/user namespace (IMA namespace would hang off the user
namespace). And the next idea then is to pass vfsmount ** and
vfs_mount_count ( from here
https://elixir.bootlin.com/linux/latest/source/security/inode.c#L25) via
a namespaced securityfs API along the lines of this here:
extern struct dentry *securityfs_ns_create_dir(const char *name, struct
dentry *parent, struct vfsmount **, int *mount_count);
The vsfcount * and mount_count would reside in the 'struct
ima_namespace'. This would hopefully let us reuse the rather simple
looking SecurityFS code, which is at least my definite starting point
before venturing into something more complicated, but I have my doubts
after the first debugging exercises with a prototype. The first issue I
know about for sure is due to the fact that we currently initialize
SecurityFS when we initialize the IMA namespace while the old user
namespace is still active. This then sets the user_ns in the superblock
to the current active userns and we'll end up failing to use the
superblock because of this check here when trying a mount:
share_extant_sb:
if (user_ns != old->s_user_ns) {
spin_unlock(&sb_lock);
destroy_unused_super(s);
return ERR_PTR(-EBUSY);
}
https://elixir.bootlin.com/linux/latest/source/fs/super.c#L556
So now the question is whether it is possible to initialize this
filesystem at clone() time while the old user namespace is active
(unlikely the way it currently works) or whether the initialization
(populating filesystem with dirs and files) needs to be deferred until
the user does a mount with the intended user namespace active? I guess
in the latter case there would have to be some sort of callback from the
filesystem code into the IMA namespace-filesystem-population-code that
gets invoked when the filesystem is mounted (?). So would the
initialization have to be done that late? I am wondering whether the
above outlined API could be called then in that callback or whether this
isn't possible then at all with the vfsmount and mount_count parameters
(that presumably help associating the directories etc. with the root).
Well, I am hoping the above makes sense and maybe you have some
directions for how one could go about this before I go down other
possible erroneous paths.
Regards,
Stefan