Quoting Serge E. Hallyn (serge@xxxxxxxxxx): ... > There's a problem though. The above suffices to prevent an unprivileged user > in a user_ns from unsharing a user_ns to write a file capability and exploit > that capability in the ns where he is unprivileged. With one exception, which > is the case where the unprivileged user is mapped to the same kuid which > created the namespace. So if uid 1000 on the host creates a namespace > where uid 1000 maps to 1000 in the namespace, then 1000 in the namespace > can create a new user_ns, write the xattr, and exploit it from the > parent namespace. This is not an uncommon case. I'm not sure what to do about > it. Ok I think I've convinced myself that requiring a kuid 0 in the container and storing that in the security.nscapability is best solution. The DAC objection is imo not really valid - we don't have to give uid 0 in the container any special privilege, we just require that the ns have a uid 0 mapping. I have not been able to think of any other reliable way to verify that the writer of the capability is authorized to grant privilege to the file when executed by current. I'm going to proceed with another POC based on the following design: 1. no new syscalls at the moment. You can choose to set/query security.nscapability, but can also just set security.capability from a user_ns and have the kernel transparently set a security.nscapability entry for you. 2. For now just a single security.nscapability entry, but in a format that turning it into an array will be a trivial change 3. When running file foo which has a security.nscapability for kuid 100000, then any namespace where kuid 100000 is root - or which has an ancestor ns where that is the case - will run the file with the listed capabilities. 4. When doing getxattr of security.capability from a user_ns, if there is a security.capability entry, that will be returned; else if there is a valid security.nscapability for your ns, that will be returned. 5. when doing a setxattr of security.capability from a user_ns, if there is a security.nscapability entry, you get EBUSY; else a security.nscapability with your root kuid will be written provided that (a) you are privileged over your namespace, (b) you are privileged over your root uid, (c) the file owner maps into your namespace. 6. when doing a getxattr of security.nscapability, the entry will be shown with kuid mapped into your namespace or -1 if the uid does not map into your ns. 7. when doing a setxattr of security.nscapability, if an entry exists, you get -EBUSY; if you are not privileged over your ns, your root uid, and the file owner, then you get -EPERM; the xattr includes a uid field, which must be either 0 or a value valid in your ns. The value will be converted to a kuid and stored on disk. (Seth, I'm not sure offhand how that should mesh with your patches, we can talk about it after I send the next patch, which I'm quite certain will handle it wrongly) 8. If a security.capability exists, it will override any security.nscapability at execve() (so, inverse of my previous two patches). -serge _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers