Quoting Miklos Szeredi (miklos@xxxxxxxxxx): > > > > > Maybe sysctls just need to check capabilities, instead of uids. I > > > > > think that would make a lot of sense anyway. > > > > > > > > Would it be as simple as tagging the inodes with capability sets? One > > > > set for writing, or one each for reading and writing? > > > > > > Yes, or something even simpler, like mapping the owner permission bits > > > to CAP_SYS_ADMIN. There seem to be very few different permissions > > > under /proc/sys: > > > > > > --w------- > > > -r--r--r-- > > > -rw------- > > > -rw-r--r-- > > > > > > As long as the group and other bits are always the same, and we accept > > > that the owner bits really mean CAP_SYS_ADMIN and not something else, > > > > But I would assume some things under /proc/sys/net/ipv4 or > > /proc/sys/net/ath0 require CAP_NET_ADMIN rather than CAP_SYS_ADMIN? > > I guess so. I'm not very familiar with the different capabilities :) > > How about this patch then: a hybrid solution between just relying on > permission bits, and specifying separate capability sets for read and > write in addition to the permission bits. > > Untested, the 'cap' field obviously still needs to be filled in where > appropriate. > > Miklos > ---- > > Index: linux/include/linux/sysctl.h > =================================================================== > --- linux.orig/include/linux/sysctl.h 2008-02-04 12:29:01.000000000 +0100 > +++ linux/include/linux/sysctl.h 2008-02-07 15:19:06.000000000 +0100 > @@ -1041,6 +1041,7 @@ struct ctl_table > void *data; > int maxlen; > mode_t mode; > + int cap; /* Capability needed to read/write */ > struct ctl_table *child; > struct ctl_table *parent; /* Automatically set */ > proc_handler *proc_handler; /* Callback for text formatting */ > Index: linux/kernel/sysctl.c > =================================================================== > --- linux.orig/kernel/sysctl.c 2008-02-05 22:17:05.000000000 +0100 > +++ linux/kernel/sysctl.c 2008-02-07 15:30:45.000000000 +0100 > @@ -1527,14 +1527,26 @@ out: > * some sysctl variables are readonly even to root. > */ > > -static int test_perm(int mode, int op) > +static int test_perm(struct ctl_table *table, int op) > { > - if (!current->euid) > - mode >>= 6; > - else if (in_egroup_p(0)) > - mode >>= 3; > + int cap = table->cap; > + mode_t mode = table->mode; > + > + if (!cap) > + cap = CAP_SYS_ADMIN; > + > + if ((op & MAY_READ) && !(mode & S_IRUGO)) > + return -EACCES; > + > + if ((op & MAY_WRITE) && !(mode & S_IWUGO)) > + return -EACCES; > + > + if (capable(cap)) > + return 0; > + > if ((mode & op & 0007) == op) > return 0; > + > return -EACCES; I like how simple it appears to be :) At first I missed the fact that owning uid is always 0 so I thought the uid processing wasn't quite enough. But since it's always 0, the only question is whether there are any /proc/sys files whose users currently depend on being setgid 0 and setgid non-0 with no capabilities. On my laptop, 'find /proc/sys -type f -perm -020' gives me no results, so that is promising. So this certainly seems like a good first step. In fact, combined with /proc/sys/ being partially remounted per container like /proc/sys/net is doing, we may not even need to do anything with CAP_NS_OVERRIDE. thanks, -serge > } > > @@ -1544,7 +1556,7 @@ int sysctl_perm(struct ctl_table *table, > error = security_sysctl(table, op); > if (error) > return error; > - return test_perm(table->mode, op); > + return test_perm(table, op); > } > > #ifdef CONFIG_SYSCTL_SYSCALL - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html