On Wed, 2021-12-01 at 13:11 -0500, Stefan Berger wrote: > On 12/1/21 12:56, James Bottomley wrote: [...] > I tried this with runc and a user namespace active mapping uid 1000 > on the host to uid 0 in the container. There I run into the problem > that all of the files and directories without the above work-around > are mapped to 'nobody', just like all the files in sysfs in this case > are also mapped to nobody. This code resolved the issue. So I applied your patches with the permission shift commented out and instrumented inode_alloc() to see where it might be failing and I actually find it all works as expected for me: ejb@testdeb:~> unshare -r --user --mount --ima root@testdeb:~# mount -t securityfs_ns none /sys/kernel/security root@testdeb:~# ls -l /sys/kernel/security/ima/ total 0 -r--r----- 1 root root 0 Dec 1 19:11 ascii_runtime_measurements -r--r----- 1 root root 0 Dec 1 19:11 binary_runtime_measurements -rw------- 1 root root 0 Dec 1 19:11 policy -r--r----- 1 root root 0 Dec 1 19:11 runtime_measurements_count -r--r----- 1 root root 0 Dec 1 19:11 violations I think your problem is something to do with how runc is installing the uid/gid mappings. If it's installing them after the security_ns inodes are created then they get the -1 value (because no mappings exist in s_user_ns). I can even demonstrate this by forcing unshare to enter the IMA namespace before writing the mapping values and I'll see "nobody nogroup" above like you do. I also see the instrumentation telling me that i_write_uid() is mapping back to 1000 in the former case and -1 in the latter. James