David Howells <dhowells@xxxxxxxxxx> wrote: > Separate the task security context from task_struct. At this point, the > security data is temporarily embedded in the task_struct with two pointers > pointing to it. This patch degrades performance of the kernel in general. This is because accessing the security data (UID, GID, groups list, LSM security pointer, etc) requires an extra dereference (eg: task->uid => task->act_as->uid). I've done some benchmarking to attempt to determine what the degradation is. Firstly, the setup: my test machine has a Core 2 Duo CPU and a gig of RAM. The system is running as limited a set of daemons as I can get away with - this means no syslogd, no klogd - as these can make the timings unstable. Basically, init, mgetty, sshd and auditd are it. On the test machines's disk is a test partition with 266 directories and 19255 files distributed across those directories. For each test, the system is rebooted, then the test partition is mounted: mount /dev/sda6 /var/fscache/ -o user_xattr,noatime,ro and the test program attached is run a couple of times to make sure that the disk backing that partition need not be touched again (the whole metadata fits in RAM): time /tmp/openit /var/fscache Then the program is run nine times in succession and the 'real' values emitted by the time program are averaged. The program stats and opens each regular file and directory in the tree, reading and recursing into each directory. It does not read any data from the regular files as this is likely to swamp the variance due to this security patch. So, I did five runs with different sets of patches and configurations: (1) SELinux enabled, none of the security patches: (gdb) p (1.119+1.110+1.104+1.106+1.095+1.112+1.094+1.091+1.112)/9 $3 = 1.104777777777777 (2) SELinux enabled, all the security patches: (gdb) p (1.204+1.190+1.182+1.190+1.200+1.205+1.199+1.197+1.209)/9 $2 = 1.1973333333333329 The percentage degradation is: (gdb) p 1.1973333333333329/1.104777777777777 $4 = 1.0837775319320126 or about 8.4%. I then created a performance improvement patch (attached) to reduce the number of times current->act_as was calculated and applied that: (3) SELinux enabled, all the security patches plus performance patch: (gdb) p (1.212+1.161+1.202+1.205+1.203+1.151+1.195+1.205+1.186)/9 $2 = 1.1911111111111106 The percentage degradation is (gdb) p 1.1911111111111106/1.104777777777777 $3 = 1.0781454289449866 about 7.8%. I then turned SELinux off to see how much of a difference that made: (4) LSM/SELinux disabled, none of the security patches: (gdb) p (1.121+1.136+1.137+1.121+1.122+1.131+1.121+1.131+1.118)/9 $7 = 1.1264444444444437 (5) LSM/SELinux disabled, all the security patches: (gdb) p (1.120+1.129+1.116+1.115+1.114+1.129+1.127+1.124+1.128)/9 $6 = 1.1224444444444437 This resulted in an improvement, which is slightly odd: (gdb) p 1.1224444444444437/1.1264444444444437 $8 = 0.99644900374827372 or about -0.4%. For some reason the results returned by the time program are quite variable, but I'm not sure why. I suspect that with SELinux off, the difference in times is within the variance of the results. The performance improvement patch can probably be itself improved by passing the calculated value of current->act_as down into LSM hooks instead of current if the former is available. Similarly, passing this into capable() or similar could probably improve things too, but there are a couple of problems there as the auditing code in SELinux's task_has_capability() wants to access the task_struct for pid and comm:-/ David
/* Simple recurive open test * * Copyright (C) 2008 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells@xxxxxxxxxx) * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public Licence * as published by the Free Software Foundation; either version * 2 of the Licence, or (at your option) any later version. */ #define _GNU_SOURCE #include <stdio.h> #include <string.h> #include <dirent.h> #include <unistd.h> #include <fcntl.h> #include <sys/stat.h> void process(int dirfd, const char *filename); void process_dir(int dirfd, const char *filename) { struct dirent *de; DIR *dir; int fd; dir = fdopendir(dirfd); if (dir) { while (de = readdir(dir)) { if (de->d_name[0] == '.' && (de->d_name[1] == '\0' || de->d_name[1] == '.' && de->d_name[2] == '\0')) continue; process(dirfd, de->d_name); } closedir(dir); } } void process(int dirfd, const char *filename) { struct stat st; int fd; if (fstatat(dirfd, filename, &st, AT_SYMLINK_NOFOLLOW) >= 0) { if (S_ISREG(st.st_mode)) { fd = openat(dirfd, filename, O_RDONLY); if (fd >= 0) close(fd); } else if (S_ISDIR(st.st_mode)) { fd = openat(dirfd, filename, O_RDONLY | O_DIRECTORY); if (fd >= 0) process_dir(fd, filename); } } } int main(int argc, char **argv) { for (argv++; *argv; argv++) process(AT_FDCWD, *argv); return 0; }
Security: Improve performance by reusing current->act_as calculations From: David Howells <dhowells@xxxxxxxxxx> Improve performance of the subjective security patches a little by reusing current->act_as calculations in a few places. Signed-off-by: David Howells <dhowells@xxxxxxxxxx> --- fs/attr.c | 13 +++++++------ fs/namei.c | 11 +++++++---- fs/posix_acl.c | 9 +++++---- include/linux/fs.h | 4 +++- include/linux/sched.h | 8 +++++++- kernel/sys.c | 3 +-- 6 files changed, 30 insertions(+), 18 deletions(-) diff --git a/fs/attr.c b/fs/attr.c index 117cca7..ba90b9f 100644 --- a/fs/attr.c +++ b/fs/attr.c @@ -20,6 +20,7 @@ /* POSIX UID/GID verification for setting inode attributes. */ int inode_change_ok(struct inode *inode, struct iattr *attr) { + struct task_security *act_as = current->act_as; int retval = -EPERM; unsigned int ia_valid = attr->ia_valid; @@ -29,30 +30,30 @@ int inode_change_ok(struct inode *inode, struct iattr *attr) /* Make sure a caller can chown. */ if ((ia_valid & ATTR_UID) && - (current_fsuid() != inode->i_uid || + (act_as->uid != inode->i_uid || attr->ia_uid != inode->i_uid) && !capable(CAP_CHOWN)) goto error; /* Make sure caller can chgrp. */ if ((ia_valid & ATTR_GID) && - (current_fsuid() != inode->i_uid || - (!in_group_p(attr->ia_gid) && attr->ia_gid != inode->i_gid)) && + (act_as->uid != inode->i_uid || + (!__in_group_p(act_as, attr->ia_gid) && attr->ia_gid != inode->i_gid)) && !capable(CAP_CHOWN)) goto error; /* Make sure a caller can chmod. */ if (ia_valid & ATTR_MODE) { - if (!is_owner_or_cap(inode)) + if (!__is_owner_or_cap(act_as, inode)) goto error; /* Also check the setgid bit! */ - if (!in_group_p((ia_valid & ATTR_GID) ? attr->ia_gid : + if (!__in_group_p(act_as, (ia_valid & ATTR_GID) ? attr->ia_gid : inode->i_gid) && !capable(CAP_FSETID)) attr->ia_mode &= ~S_ISGID; } /* Check for setting the inode time. */ if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET)) { - if (!is_owner_or_cap(inode)) + if (!__is_owner_or_cap(act_as, inode)) goto error; } fine: diff --git a/fs/namei.c b/fs/namei.c index 495c759..81277fa 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -182,9 +182,10 @@ EXPORT_SYMBOL(putname); int generic_permission(struct inode *inode, int mask, int (*check_acl)(struct inode *inode, int mask)) { + struct task_security *act_as = current->act_as; umode_t mode = inode->i_mode; - if (current_fsuid() == inode->i_uid) + if (act_as->uid == inode->i_uid) mode >>= 6; else { if (IS_POSIXACL(inode) && (mode & S_IRWXG) && check_acl) { @@ -195,7 +196,7 @@ int generic_permission(struct inode *inode, int mask, return error; } - if (in_group_p(inode->i_gid)) + if (__in_group_p(act_as, inode->i_gid)) mode >>= 3; } @@ -457,14 +458,16 @@ static struct dentry * cached_lookup(struct dentry * parent, struct qstr * name, static int exec_permission_lite(struct inode *inode, struct nameidata *nd) { + struct task_security *act_as; umode_t mode = inode->i_mode; if (inode->i_op && inode->i_op->permission) return -EAGAIN; - if (current_fsuid() == inode->i_uid) + act_as = current->act_as; + if (act_as->uid == inode->i_uid) mode >>= 6; - else if (in_group_p(inode->i_gid)) + else if (__in_group_p(act_as, inode->i_gid)) mode >>= 3; if (mode & MAY_EXEC) diff --git a/fs/posix_acl.c b/fs/posix_acl.c index 39df95a..623370d 100644 --- a/fs/posix_acl.c +++ b/fs/posix_acl.c @@ -211,28 +211,29 @@ int posix_acl_permission(struct inode *inode, const struct posix_acl *acl, int want) { const struct posix_acl_entry *pa, *pe, *mask_obj; + struct task_security *act_as = current->act_as; int found = 0; FOREACH_ACL_ENTRY(pa, acl, pe) { switch(pa->e_tag) { case ACL_USER_OBJ: /* (May have been checked already) */ - if (inode->i_uid == current_fsuid()) + if (inode->i_uid == act_as->uid) goto check_perm; break; case ACL_USER: - if (pa->e_id == current_fsuid()) + if (pa->e_id == act_as->uid) goto mask; break; case ACL_GROUP_OBJ: - if (in_group_p(inode->i_gid)) { + if (__in_group_p(act_as, inode->i_gid)) { found = 1; if ((pa->e_perm & want) == want) goto mask; } break; case ACL_GROUP: - if (in_group_p(pa->e_id)) { + if (__in_group_p(act_as, pa->e_id)) { found = 1; if ((pa->e_perm & want) == want) goto mask; diff --git a/include/linux/fs.h b/include/linux/fs.h index d218ef5..cdf3e28 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1064,8 +1064,10 @@ enum { #define put_fs_excl() atomic_dec(¤t->fs_excl) #define has_fs_excl() atomic_read(¤t->fs_excl) +#define __is_owner_or_cap(act_as, inode) \ + (((act_as)->uid == (inode)->i_uid) || capable(CAP_FOWNER)) #define is_owner_or_cap(inode) \ - ((current_fsuid() == (inode)->i_uid) || capable(CAP_FOWNER)) + (__is_owner_or_cap(current->act_as, (inode))) /* not quite ready to be deprecated, but... */ extern void lock_super(struct super_block *); diff --git a/include/linux/sched.h b/include/linux/sched.h index 983b532..f0d645d 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1739,7 +1739,13 @@ extern void wake_up_new_task(struct task_struct *tsk, extern void sched_fork(struct task_struct *p, int clone_flags); extern void sched_dead(struct task_struct *p); -extern int in_group_p(gid_t); +extern int __in_group_p(struct task_security *, gid_t); + +static inline int in_group_p(gid_t grp) +{ + return __in_group_p(current->act_as, grp); +} + extern int in_egroup_p(gid_t); extern void proc_caches_init(void); diff --git a/kernel/sys.c b/kernel/sys.c index ec0c251..7bc39f9 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1337,9 +1337,8 @@ asmlinkage long sys_setgroups(int gidsetsize, gid_t __user *grouplist) /* * Check whether we're fsgid/egid or in the supplemental group.. */ -int in_group_p(gid_t grp) +int __in_group_p(struct task_security *act_as, gid_t grp) { - struct task_security *act_as = current->act_as; int retval = 1; if (grp != act_as->fsgid) retval = groups_search(act_as->group_info, grp);