On Fri, Dec 15, 2023 at 02:26:53PM +0100, Michael Weiß wrote: > On 15.12.23 13:31, Christian Brauner wrote: > > On Wed, Dec 13, 2023 at 03:38:13PM +0100, Michael Weiß wrote: > >> devguard is a simple LSM to allow CAP_MKNOD in non-initial user > >> namespace in cooperation of an attached cgroup device program. We > >> just need to implement the security_inode_mknod() hook for this. > >> In the hook, we check if the current task is guarded by a device > >> cgroup using the lately introduced cgroup_bpf_current_enabled() > >> helper. If so, we strip out SB_I_NODEV from the super block. > >> > >> Access decisions to those device nodes are then guarded by existing > >> device cgroups mechanism. > >> > >> Signed-off-by: Michael Weiß <michael.weiss@xxxxxxxxxxxxxxxxxxx> > >> --- > > > > I think you misunderstood me... My point was that I believe you don't > > need an additional LSM at all and no additional LSM hook. But I might be > > wrong. Only a POC would show. > > Yeah sorry, I got your point now. I think I might have had a misconception about how this works. A bpf LSM program can't easily alter a kernel object such as struct super_block I've been told. > > > > > Just write a bpf lsm program that strips SB_I_NODEV in the existing > > security_sb_set_mnt_opts() call which is guranteed to be called when a > > new superblock is created. > > This does not work since SB_I_NODEV is a required_iflag in > mount_too_revealing(). This I have already tested when writing the > simple LSM here. So maybe we need to drop SB_I_NODEV from required_flags > there, too. Would that be safe? Right. I think we might be able to add a new SB_I_MANAGED_DEVICES flag. __UNTESTED, UNCOMPILED_ diff --git a/fs/namespace.c b/fs/namespace.c index fbf0e596fcd3..e87cc0320091 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -4887,7 +4887,6 @@ static bool mnt_already_visible(struct mnt_namespace *ns, static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags) { - const unsigned long required_iflags = SB_I_NOEXEC | SB_I_NODEV; struct mnt_namespace *ns = current->nsproxy->mnt_ns; unsigned long s_iflags; @@ -4899,9 +4898,13 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags if (!(s_iflags & SB_I_USERNS_VISIBLE)) return false; - if ((s_iflags & required_iflags) != required_iflags) { - WARN_ONCE(1, "Expected s_iflags to contain 0x%lx\n", - required_iflags); + if (!(s_iflags & SB_I_NOEXEC)) { + WARN_ONCE(1, "Expected s_iflags to contain SB_I_NOEXEC\n"); + return true; + } + + if (!(s_iflags & (SB_I_NODEV | SB_I_MANAGED_DEVICES))) { + WARN_ONCE(1, "Expected s_iflags to contain device access mask\n"); return true; } diff --git a/include/linux/fs.h b/include/linux/fs.h index 98b7a7a8c42e..6ca0fe922478 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1164,6 +1164,7 @@ extern int send_sigurg(struct fown_struct *fown); #define SB_I_USERNS_VISIBLE 0x00000010 /* fstype already mounted */ #define SB_I_IMA_UNVERIFIABLE_SIGNATURE 0x00000020 #define SB_I_UNTRUSTED_MOUNTER 0x00000040 +#define SB_I_MANAGED_DEVICES 0x00000080 #define SB_I_SKIP_SYNC 0x00000100 /* Skip superblock at global sync */ #define SB_I_PERSB_BDI 0x00000200 /* has a per-sb bdi */ > > > > > Store your device access rules in a bpf map or in the sb->s_security > > blob (This is where I'm fuzzy and could use a bpf LSM expert's input.). > > > > Then make that bpf lsm program kick in everytime a > > security_inode_mknod() and security_file_open() is called and do device > > access management in there. Actually, you might need to add one hook > > when the actual device that's about to be opened is know. > > This should be where today the device access hooks are called. > > > > And then you should already be done with this. The only thing that you > > need is the capable check patch. > > > > You don't need that cgroup_bpf_current_enabled() per se. Device > > management could now be done per superblock, and not per task. IOW, you > > allowlist a bunch of devices that can be created and opened. Any task > > that passes basic permission checks and that passes the bpf lsm program > > may create device nodes. > > > > That's a way more natural device management model than making this a per > > cgroup thing. Though that could be implemented as well with this. > > > > I would try to write a bpf lsm program that does device access > > management with your capable() sysctl patch applied and see how far I > > get. > > > > I don't have the time otherwise I'd do it. > I'll give it a try but no promises how fast this will go. No worries. We're entering the holiday season anyway.