On 12/1/21 12:56, James Bottomley wrote:
On Tue, 2021-11-30 at 11:06 -0500, Stefan Berger wrote:
[...]
+
+/*
+ * Fix the ownership (uid/gid) of the dentry's that couldn't be set
at the
+ * time of their creation because the user namespace wasn't
configured, yet.
+ */
+static void ima_fs_ns_fixup_uid_gid(struct ima_namespace *ns)
+{
+ struct inode *inode;
+ size_t i;
+
+ if (ns->file_ownership_fixes_done ||
+ ns->user_ns->uid_map.nr_extents == 0)
+ return;
+
+ ns->file_ownership_fixes_done = true;
+ for (i = 0; i < IMAFS_DENTRY_LAST; i++) {
+ if (!ns->dentry[i])
+ continue;
+ inode = ns->dentry[i]->d_inode;
+ inode->i_uid = make_kuid(ns->user_ns, 0);
+ inode->i_gid = make_kgid(ns->user_ns, 0);
+ }
+}
+
+/* Fix the permissions when a file is opened */
+int ima_fs_ns_permission(struct user_namespace *mnt_userns, struct
inode *inode,
+ int mask)
+{
+ ima_fs_ns_fixup_uid_gid(get_current_ns());
+ return generic_permission(mnt_userns, inode, mask);
+}
+
+const struct inode_operations ima_fs_ns_inode_operations = {
+ .lookup = simple_lookup,
+ .permission = ima_fs_ns_permission,
+};
+
In theory this uid/gid shifting should have already been done for you
and all of the above code should be unnecessary. What is supposed to
happen is that the mount of securityfs_ns in the new user namespace
should pick up a superblock s_user_ns for that new user namespace. Now
inode_alloc() uses i_uid_write(inode, 0) which maps back through the
s_user_ns to obtain the owner of the user namespace.
What can happen is that if you do the inode allocation before (or even
without) writing to the uid_map file, it maps back through an empty map
and ends up with -1 for i_uid ... is this what you're seeing?
I tried this with runc and a user namespace active mapping uid 1000 on
the host to uid 0 in the container. There I run into the problem that
all of the files and directories without the above work-around are
mapped to 'nobody', just like all the files in sysfs in this case are
also mapped to nobody. This code resolved the issue.
sh-5.1# ls -l /sys/
total 0
drwxr-xr-x. 2 nobody nobody 0 Dec 1 18:06 block
drwxr-xr-x. 28 nobody nobody 0 Dec 1 18:06 bus
drwxr-xr-x. 54 nobody nobody 0 Dec 1 18:06 class
drwxr-xr-x. 4 nobody nobody 0 Dec 1 18:06 dev
drwxr-xr-x. 15 nobody nobody 0 Dec 1 18:06 devices
drwxrwxrwt. 2 root root 40 Dec 1 18:06 firmware
drwxr-xr-x. 9 nobody nobody 0 Dec 1 18:06 fs
drwxr-xr-x. 16 nobody nobody 0 Dec 1 18:06 kernel
drwxr-xr-x. 161 nobody nobody 0 Dec 1 18:06 module
drwxr-xr-x. 3 nobody nobody 0 Dec 1 18:06 power
sh-5.1# ls -l /sys/kernel/security/
total 0
lr--r--r--. 1 nobody nobody 0 Dec 1 18:06 ima -> integrity/ima
drwxr-xr-x. 3 nobody nobody 0 Dec 1 18:06 integrity
sh-5.1# ls -l /sys/kernel/security/ima/
total 0
-r--r-----. 1 root root 0 Dec 1 18:06 ascii_runtime_measurements
-r--r-----. 1 root root 0 Dec 1 18:06 binary_runtime_measurements
-rw-------. 1 root root 0 Dec 1 18:06 policy
-r--r-----. 1 root root 0 Dec 1 18:06 runtime_measurements_count
-r--r-----. 1 root root 0 Dec 1 18:06 violations
The nobody's are obviously sufficient to cd into the directories, but
for file accesses I wanted to see root and no changes to permissions.
Stefan
James