On 12/1/21 17:01, James Bottomley wrote:
On Wed, 2021-12-01 at 16:34 -0500, Stefan Berger wrote:
On 12/1/21 16:11, James Bottomley wrote:
On Wed, 2021-12-01 at 15:25 -0500, Stefan Berger wrote:
On 12/1/21 14:21, James Bottomley wrote:
On Wed, 2021-12-01 at 13:11 -0500, Stefan Berger wrote:
On 12/1/21 12:56, James Bottomley wrote:
[...]
I tried this with runc and a user namespace active mapping
uid
1000 on the host to uid 0 in the container. There I run into
the
problem that all of the files and directories without the
above
work-around are mapped to 'nobody', just like all the files
in
sysfs in this case are also mapped to nobody. This code
resolved
the issue.
So I applied your patches with the permission shift commented
out
and instrumented inode_alloc() to see where it might be failing
and
I actually find it all works as expected for me:
ejb@testdeb:~> unshare -r --user --mount --ima
root@testdeb:~# mount -t securityfs_ns none
/sys/kernel/security
root@testdeb:~# ls -l /sys/kernel/security/ima/
total 0
-r--r----- 1 root root 0 Dec 1 19:11
ascii_runtime_measurements
-r--r----- 1 root root 0 Dec 1 19:11
binary_runtime_measurements
-rw------- 1 root root 0 Dec 1 19:11 policy
-r--r----- 1 root root 0 Dec 1 19:11
runtime_measurements_count
-r--r----- 1 root root 0 Dec 1 19:11 violations
I think your problem is something to do with how runc is
installing
the uid/gid mappings. If it's installing them after the
security_ns inodes are created then they get the -1 value
(because
no mappings exist in s_user_ns). I can even demonstrate this
by
forcing unshare to enter the IMA namespace before writing the
mapping values and I'll see "nobody nogroup" above like you do.
I am surprised you get this mapping even after commenting the
permission adjustments... it doesn't work for me when I comment
them
out:
[stefanb@ima-ns-dev rootfs]$ unshare -r --user --mount
[root@ima-ns-dev rootfs]# mount -t securityfs_ns none
/sys/kernel/security/
[root@ima-ns-dev rootfs]# cd /sys/kernel/security/ima/
[root@ima-ns-dev ima]# ls -l
total 0
-r--r-----. 1 nobody nobody 0 Dec 1 15:20
ascii_runtime_measurements
-r--r-----. 1 nobody nobody 0 Dec 1 15:20
binary_runtime_measurements
-rw-------. 1 nobody nobody 0 Dec 1 15:20 policy
-r--r-----. 1 nobody nobody 0 Dec 1 15:20
runtime_measurements_count
-r--r-----. 1 nobody nobody 0 Dec 1 15:20 violations
[root@ima-ns-dev ima]# cat /proc/self/uid_map
0 1000 1
[root@ima-ns-dev ima]# cat /proc/self/gid_map
0 1000 1
The initialization of securityfs and setup of files and
directories
happens at the same time as the IMA namespace is created. At this
time there are no user mappings available, so that's why I need
to
make the adjustments 'late'.
There is one other possible difference: To get the correct
s_user_ns
I am currently wondering why I cannot re-create your setup while
disabling the remapping...
OK, I think I figured it out. When I applied your patches, it was on
top of my existing ones, so I had to massage them a bit.
Your problem is the securityfs inode creation is triggered inside
create_user_ns, which means it happens *before* ushare writes to the
proc/self/uid_map file, so the securityfs_inodes are always created on
an empty mapping and i_write_uid always sets the inode uid to -1.
Right, the initialization of the filesystem is quite early.
I don't see this because my setup for everything is triggered off the
first use of the IMA namespace. You'd need to have some type of lazy
setup of the inodes as well to give unshare time to install the uid/gid
mappings.
What could trigger that? A callback while mounting - but I am not sure
where to hook into then. What is your mechanisms to trigger as the
'first use of the IMA namespace'? What is 'use' here?
Stefan
James