On Tue, 21 Feb 2012 13:58:00 -0800 Kees Cook <keescook@xxxxxxxxxxxx> wrote: > On systems that have user-writable directories on the same partition > as system files, a long-standing class of security issues is the > hardlink-based time-of-check-time-of-use race, most commonly seen in > world-writable directories like /tmp. The common method of exploitation > of this flaw is to cross privilege boundaries when following a given > hardlink (i.e. a root process follows a hardlink created by another > user). Additionally, an issue exists where users can "pin" a potentially > vulnerable setuid/setgid file so that an administrator will not actually > upgrade a system fully. > > The solution is to permit hardlinks to only be created when the user is > already the existing file's owner, or if they already have read/write > access to the existing file. > > Many Linux users are surprised when they learn they can link to files > they have no access to, so this change appears to follow the doctrine > of "least surprise". Additionally, this change does not violate POSIX, > which states "the implementation may require that the calling process > has permission to access the existing file"[1]. > > This change is known to break some implementations of the "at" daemon, > though the version used by Fedora and Ubuntu has been fixed[2] for > a while. Otherwise, the change has been undisruptive while in use in > Ubuntu for the last 1.5 years. > > This patch is based on the patch in Openwall and grsecurity. I have > added a sysctl to enable the protected behavior, documentation, and an > audit notification. > > [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html > [2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279 > Looks OKish to me. > --- a/Documentation/sysctl/fs.txt > +++ b/Documentation/sysctl/fs.txt > @@ -32,7 +32,8 @@ Currently, these files are in /proc/sys/fs: > - nr_open > - overflowuid > - overflowgid > -- protected_sticky_symlinks > +- protected_hardlinks > +- protected_symlinks It's silly to add protected_sticky_symlinks and to then rename it in the very next patch. I have reworked my copy of the earlier fs-symlink-restrictions-on-sticky-directories.patch so that it adds "protected_symlinks". I then reworked this patch (fs-hardlink-creation-restrictions.patch) to add protected_hardlinks. Nice and simple. > > ... > > +static inline void > +audit_log_link_denied(const char *operation, struct path *link) > +{ > + struct audit_buffer *ab; > + > + ab = audit_log_start(current->audit_context, GFP_KERNEL, AUDIT_AVC); > + audit_log_format(ab, "op=%s action=denied", operation); > + audit_log_format(ab, " pid=%d comm=", current->pid); > + audit_log_untrustedstring(ab, current->comm); > + audit_log_d_path(ab, " path=", link); > + audit_log_format(ab, " dev="); > + audit_log_untrustedstring(ab, link->dentry->d_inode->i_sb->s_id); > + audit_log_format(ab, " ino=%lu", link->dentry->d_inode->i_ino); > + audit_log_end(ab); > +} This is far too large to inline. It has two callsites and gcc is probably smart anough to uninline it for us. I queued a patch to remove the `inline'. > /** > * may_follow_link - Check symlink following for unsafe situations > - * @dentry: The inode/dentry of the symlink > - * @nameidata: The path data of the symlink > + * @link: The path of the symlink > * > - * In the case of the protected_sticky_symlinks sysctl being enabled, > + * In the case of the sysctl_protected_symlinks sysctl being enabled, > * CAP_DAC_OVERRIDE needs to be specifically ignored if the symlink is > * in a sticky world-writable directory. This is to protect privileged > * processes from failing races against path names that may change out > @@ -643,19 +660,20 @@ int sysctl_protected_sticky_symlinks __read_mostly = > * > * Returns 0 if following the symlink is allowed, -ve on error. > */ > -static inline int > -may_follow_link(struct dentry *dentry, struct nameidata *nameidata) > +static inline int may_follow_link(struct path *link) > { > int error = 0; > const struct inode *parent; > const struct inode *inode; > const struct cred *cred; > + struct dentry *dentry; > > - if (!sysctl_protected_sticky_symlinks) > + if (!sysctl_protected_symlinks) > return 0; This is also too large to inline. It doesn't matter a lot - it's not a hot path. I removed this `inline' as well. gcc will probably inline it anyway. If we were really concerned about speed here, we might decide to inline the test of sysctl_protected_symlinks and to uninline the remainder of the function. > /* Allowed if owner and follower match. */ > cred = current_cred(); > + dentry = link->dentry; > inode = dentry->d_inode; > if (cred->fsuid == inode->i_uid) > return 0; > > ... > > +static inline int may_linkat(struct path *link) > +{ > + int error = 0; > + const struct cred *cred; > + struct inode *inode; > + int mode; > + > + if (!sysctl_protected_hardlinks) > + return 0; > + > + cred = current_cred(); > + inode = link->dentry->d_inode; > + mode = inode->i_mode; > + > + if (cred->fsuid != inode->i_uid && > + (!S_ISREG(mode) || (mode & S_ISUID) || > + ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) || > + (inode_permission(inode, MAY_READ | MAY_WRITE))) && > + !capable(CAP_FOWNER)) > + error = -EPERM; omigod. Geeze man, whose side are you on? I suggest something like this: --- a/fs/namei.c~fs-hardlink-creation-restrictions-fix-fix +++ a/fs/namei.c @@ -692,6 +692,23 @@ static inline int may_follow_link(struct return error; } +static bool foobar(struct inode *inode, umode_t mode) +{ + /* nice comment */ + if (!S_ISREG(mode)) + return true; + /* nice comment */ + if (mode & S_ISUID) + return true; + /* nice comment */ + if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) + return true; + /* nice comment */ + if (inode_permission(inode, MAY_READ | MAY_WRITE)) + return true; + return false; +} + /** * may_linkat - Check permissions for creating a hardlink * @link: the source to hardlink from @@ -722,11 +739,8 @@ static int may_linkat(struct path *link) inode = link->dentry->d_inode; mode = inode->i_mode; - if (cred->fsuid != inode->i_uid && - (!S_ISREG(mode) || (mode & S_ISUID) || - ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) || - (inode_permission(inode, MAY_READ | MAY_WRITE))) && - !capable(CAP_FOWNER)) + if (cred->fsuid != inode->i_uid && foobar(inode, mode) && + !capable(CAP_FOWNER)) error = -EPERM; if (error) Please take a look at that, pick a replacement for foobar, fill in the nice comments which explain our reasoning, retest it and send it back at me? Also please take a look at local variable "mode" and decide if there was a good reason for it being `int' instead of mode_t? -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html