On Mon, Dec 06, 2021 at 08:38:29AM -0500, James Bottomley wrote: > On Mon, 2021-12-06 at 13:08 +0100, Christian Brauner wrote: > > On Fri, Dec 03, 2021 at 11:37:14AM -0800, Casey Schaufler wrote: > > > On 12/3/2021 10:50 AM, James Bottomley wrote: > > > > On Fri, 2021-12-03 at 13:06 -0500, Stefan Berger wrote: > > > > > On 12/3/21 12:03, James Bottomley wrote: > > > > > > On Thu, 2021-12-02 at 21:31 -0500, Stefan Berger wrote: > > > > > > [...] > > > > > > > static int securityfs_init_fs_context(struct fs_context > > > > > > > *fc) > > > > > > > { > > > > > > > + int rc; > > > > > > > + > > > > > > > + if (fc->user_ns->ima_ns->late_fs_init) { > > > > > > > + rc = fc->user_ns->ima_ns->late_fs_init(fc- > > > > > > > >user_ns); > > > > > > > + if (rc) > > > > > > > + return rc; > > > > > > > + } > > > > > > > fc->ops = &securityfs_context_ops; > > > > > > > return 0; > > > > > > > } > > > > > > I know I suggested this, but to get this to work in general, > > > > > > it's going to have to not be specific to IMA, so it's going > > > > > > to have to become something generic like a notifier > > > > > > chain. The other problem is it's only working still by > > > > > > accident: > > > > > > > > > > I had thought about this also but the rationale was: > > > > > > > > > > securityfs is compiled due to CONFIG_IMA_NS and the user > > > > > namespace exists there and that has a pointer now to > > > > > ima_namespace, which can have that callback. I assumed that > > > > > other namespaced subsystems could also be reached then via > > > > > such a callback, but I don't know. > > > > > > > > Well securityfs is supposed to exist for LSMs. At some point > > > > each of those is going to need to be namespaced, which may > > > > eventually be quite a pile of callbacks, which is why I thought > > > > of a notifier. > > > > > > While AppArmor, lockdown and the integrity family use securityfs, > > > SELinux and Smack do not. They have their own independent > > > filesystems. Implementations of namespacing for each of SELinux and > > > Smack have been proposed, but nothing has been adopted. It would be > > > really handy to namespace the infrastructure rather than each > > > individual LSM, but I fear that's a bigger project than anyone will > > > be taking on any time soon. It's likely to encounter many of the > > > same issues that I've been dealing with for module stacking. > > > > The main thing that bothers me is that it uses simple_pin_fs() and > > simple_unpin_fs() which I would try hard to get rid of if possible. > > The existence of this global pinning logic makes namespacing it > > properly more difficult then it needs to be and it creates imho wonky > > semantics where the last unmount doesn't really destroy the > > superblock. > > So in the notifier sketch I posted, I got rid of the pinning but only > for the non root user namespace use case ... which basically means only > for converted consumers of securityfs. The last unmount of securityfs > inside the namespace now does destroy the superblock ... I checked. Yeah, I saw. I'm struggling to follow the series but I pulled Stefan's branch and put your patch on top of it so I peruse it. > > The same isn't true for the last unmount of the root namespace, but > that has to be so to keep the current semantics. > > > Instead subsequents mounts resurface the same superblock. There > > might be an inherent design reason why this needs to be this way but > > I would advise against these semantics for anything that wants to be > > namespaced. Probably the first securityfs mount in init_user_ns can > > follow these semantics but ones tied to a non-initial user namespace > > should not as the userns can go away. In that case the pinning logic > > seems strange as conceptually the userns pins the securityfs mount as > > evidenced by the fact that we key by it in get_tree_keyed(). > > Yes, that's basically what I did: pin if ns == &init_user_ns but don't > pin if not. However, I'm still not sure I got the triggers right. We > have to trigger the notifier call (which adds the namespaced file > entries) from context free, because that's the first place the > superblock mount is fully set up ... I can't do it in fill_super > because the mount isn't fully initialized (and the locking prevents > it). I did manage to get the notifier for teardown triggered from > kill_super, though. Once Stefan answer my questions about fill_super I _might_ have an idea how to improve this.