Quoting Dirk Steinmetz (public@xxxxxxxxxxxxxxxxxx): > On Tue, 27 Oct 2015 20:28:02 +0000, Serge Hallyn wrote: > > Quoting Dirk Steinmetz (public@xxxxxxxxxxxxxxxxxx): > > > On Tue, 27 Oct 2015 09:33:44 -0500, Seth Forshee wrote: > > > > I did want to point what seems to be an inconsistency in how > > > > capabilities in user namespaces are handled with respect to inodes. When > > > > I started looking at this my initial thought was to replace > > > > capable(CAP_FOWNER) with capable_wrt_inode_uidgid(inode, CAP_FOWNER). On > > > > the face of it this should be equivalent to what's done here, but it > > > > turns out that capable_wrt_inode_uidgid requires that the inode's uid > > > > and gid are both mapped into the namespace whereas > > > > inode_owner_or_capable only requires the uid be mapped. I'm not sure how > > > > significant that is, but it seems a bit odd. > > > > > > I agree that this seems odd. I've chosen inode_owner_or_capable over > > > capable_wrt_inode_uidgid(inode, CAP_FOWNER) as it seemed consistent: > > > a privileged user (with CAP_SETUID) can impersonate the owner UID and thus > > > bypass the check completely; this also matches the documented behavior of > > > CAP_FOWNER: "Bypass permission checks on operations that normally require > > > the filesystem UID of the process to match the UID of the file". > > > > > > However, thinking about it I get the feeling that checking the gid seems > > > reasonable as well. This is, however, independently of user namespaces. > > > Consider the following scenario in any namespace, including the init one: > > > - A file has the setgid and user/group executable bits set, and is owned > > > by user:group. > > > - The user 'user' is not in the group 'group', and does not have any > > > capabilities. > > > - The user 'user' hardlinks the file. The permission check will succeed, > > > as the user is the owner of the file. > > > - The file is replaced with a newer version (for example fixing a security > > > issue) > > > - Now user can still use the hardlink-pinned version to execute the file > > > as 'user:group' (and for example exploit the security issue). > > > I would have expected the user to not be able to hardlink, as he lacks > > > CAP_FSETID, and thus is not allowed to chmod, change or move the file > > > without loosing the setgid bit. So it is impossible for him to make a non- > > > hardlink copy with the setgid bit set -- why should he be able to make a > > > hardlinked one? > > > > Yeah, this sounds sensible. It allows a user without access to 'disk', > > for instance, to become that group. > > > > > It seems to me as if may_linkat would additionally require a check > > > verifying that either > > > - not both setgid and group executable bit set > > > - fsgid == owner gid > > > - capable_wrt_inode_uidgid(CAP_FSETID) -- or CAP_FOWNER, depending on > > > whether to adapt chmod's behavior or keeping everything hardlink- > > > related in CAP_FOWNER; I don't feel qualified enough to pick ;) > > > > In particular just changing it is not ok since people who are using file > > capabilities to grant what they currently need would be stuck with a > > mysterious new failure. > > Is there any use case (besides exploiting hardlinks with malicious intent) > that would be broken when changing this? There are some (imho) rather > unlikely conditions to be met in order to observe changed behavior: The simplest example would be if I wanted to run a very quick program to just add the symbolic link. Let's say the link /usr/sbin/uuidd were owned by root:disk and setuid and setgid. The proposed change would force me to bind in both the root user and disk group, whereas without it I can just bind in only the root user. We've already dealt with such regressions and iirc agreed that they were worthwhile. > - a user owns an executable setgid-file belonging to a group he is not in > - the user does not have CAP_FSETID (or CAP_FOWNER, depending on which one > is chosen to be required) > - the user is for some legitimate reason supposed to hardlink the file > If these conditions are not met in practice, the change would not break > anything. In that case, it would be imho better to not provide > backward-compatibility to reduce complexity in these checks. Else, I'd > propose adding a new possible value '2' for > /proc/sys/fs/protected_hardlinks, while keeping '1' for the current > behavior. > > I can provide an initial draft for either of the options, but would like > recommendations to which of the two ways to take (or is there a third > one?), as well as comments on the new condition itself: may_linkat would > block hardlinks when all of the following conditions are met: > - sysctrl_protected_hardlinks is enabled or 2 (depending on way) > - inode uid != fsuid and no CAP_FOWNER (for userns: with mapping on uid), > while the hardlink source is not a regular file, is a setuid-executable > or is not accessible for reading and writing > - inode gid not fsgid or in supplemental gids and no CAP_FSETID (for > userns: with mapping on gid -- not sure whether the uid is relevant?), > while the hardlink source is a setgid-executable (with group executable > bit set) > > If anyone else wants to fix the issue, thats fine with me as well. > > Dirk -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html