On 06/18/2017 09:13 PM, Stefan Berger wrote:
On 06/18/2017 06:14 PM, Serge E. Hallyn wrote:Quoting Stefan Berger (stefanb@xxxxxxxxxxxxxxxxxx):On 06/14/2017 11:05 PM, Serge E. Hallyn wrote:On Wed, Jun 14, 2017 at 08:27:40AM -0400, Stefan Berger wrote:On 06/13/2017 07:55 PM, Serge E. Hallyn wrote:Quoting Stefan Berger (stefanb@xxxxxxxxxxxxxxxxxx):Right, I missed that in your original email when I saw it this morning. It's not what my patch does, but it's an interesting idea. Do you haveIf all extended attributes were to support this model, maybe the 'uid' could beassociated with the 'name' of the xattr rather than its 'value' (notsure whether that's possible).a patch to that effect? We might even be able to generalize that toNo, I don't have a patch. It may not be possible to implement it. The xattr_handler's take the name of the xattr as input to get().That may be ok though. Assume the host created a container with 100000 as the uid for root, which created a container with 130000 as uid for root. If root in the nested container tries to read the xattr, the kernel can check for security.foo[130000] first, then security.foo[100000], then security.foo. Or, it can do a listxattr and look for those. Am I overlooking one?So one could try to encode the mapped uid in the name. However, thatI thought that's exactly what you were suggesting in your original email? "security.capability[uid=2000]"could lead to problems with stale xattrs in a shared filesystem over time unless one could limit the number of xattrs with the same prefix, e.g., security.capability*. So I doubt that it would work.Hm. Yeah. But really how many setups are there like that? I.e. if you launch a regular docker or lxd container, the image doesn't do a bind mount of a shared image, it layers something above it or does acopy. What setups do you know of where multiple containers in differentuser namespaces mount the same filesystem shared and writeable?I think I have something now that accomodates userns access to security.capability: https://github.com/stefanberger/linux/commits/xattr_for_usernsThanks!Encoding of uid is in the attribute name now as follows: security.foo@uid=<uid> 1) The 'plain' security.capability is only r/w accessible from the host (init_user_ns). 2) When userns reads/writes 'security.capability' it will read/write security.capability@uid=<uid> instead, with uid being the uid of root , e.g. 1000. 3) When listing xattrs for userns the host's security.capability is filtered out to avoid read failures iof 'security.capability' if security.capability@uid=<uid> is read but not there. (see 1) and 2)) 4) security.capability* may all be read from anywhere 5) security.capability@uid=<uid> may be read or written directly from a userns if <uid> matches the uid of root (current_uid())This looks very close to what we want. One exception - we do want to support root in a user namespace being able to write security.capability@uid=<x> where <x> is a valid uid mapped in its namespace. In that case the name should be rewritten to be security.capability@uid=<y> where y is the unmapped kuid.val.I'll try to write a patch on top of the existing one.
Did that now in a 2nd patch (that also fixes a few problems of the 1st). In a user ns mapped to 1000 root can write security.capability@uid=123, which then ends up writing to security.capability@uid=1123. The reading also works with @uid=123. When listing xattrs only those get shown that actually have valid mappings.
https://github.com/stefanberger/linux/commits/xattr_for_userns Stefan _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers