SecurityFS for namespaced IMA

Stefan Berger <stefanb@xxxxxxxxxxxxx> · Thu, 18 Nov 2021 17:11:22 -0500

Hi !

  I have been looking at the SecurityFS code (security/inode.c) since I 
am currently trying to figure out how to create a version of SecurityFS 
for IMA namespaces support and maybe you have some guidance on this. 
Having a namespace-supporting (multi-instance) version/derivative of 
SecurityFS seems to be key for what we want to do for namespacing IMA. 
Also, I was trying to get a discussion started here that may give you 
some more details about what we are trying to do: 
https://github.com/opencontainers/runc/issues/3288

  I have looked through some of the core code of the Linux filesystem 
subsystem a bit and it seems like what we would need for our intentions 
is some sort of filesystem that is using the function get_tree_keyed() 
where the key is an identifier for the namespace, maybe the IMA/user 
namespace pointer itself. [ 
https://elixir.bootlin.com/linux/latest/source/fs/super.c#L1190 ] This 
would presumably allow us to create an instance of this new filesystem 
per IMA/user namespace (IMA namespace would hang off the user 
namespace). And the next idea then is to pass vfsmount ** and 
vfs_mount_count ( from here 
https://elixir.bootlin.com/linux/latest/source/security/inode.c#L25) via 
a namespaced securityfs API along the lines of this here:

extern struct dentry *securityfs_ns_create_dir(const char *name, struct 
dentry *parent, struct vfsmount **, int *mount_count);

The vsfcount * and mount_count would reside in the 'struct 
ima_namespace'. This would hopefully let us reuse the rather simple 
looking SecurityFS code, which is at least my definite starting point 
before venturing into something more complicated, but I have my doubts 
after the first debugging exercises with a prototype. The first issue I 
know about for sure is due to the fact that we currently initialize 
SecurityFS when we initialize the IMA namespace while the old user 
namespace is still active. This then sets the user_ns in the superblock 
to the current active userns and we'll end up failing to use the 
superblock because of this check here when trying a mount:

share_extant_sb:
    if (user_ns != old->s_user_ns) {
        spin_unlock(&sb_lock);
        destroy_unused_super(s);
        return ERR_PTR(-EBUSY);
    }

https://elixir.bootlin.com/linux/latest/source/fs/super.c#L556

So now the question is whether it is possible to initialize this 
filesystem at clone() time while the old user namespace is active 
(unlikely the way it currently works) or whether the initialization 
(populating filesystem with dirs and files) needs to be deferred until 
the user does a mount with the intended user namespace active? I guess 
in the latter case there would have to be some sort of callback from the 
filesystem code into the IMA namespace-filesystem-population-code that 
gets invoked when the filesystem is mounted (?). So would the 
initialization have to be done that late? I am wondering whether the 
above outlined API could be called then in that callback or whether this 
isn't possible then at all with the vfsmount and mount_count parameters 
(that presumably help associating the directories etc. with the root).

Well, I am hoping the above makes sense and maybe you have some 
directions for how one could go about this before I go down other 
possible erroneous paths.

Regards,

   Stefan