Il 26/07/2014 23:04, Eric W. Biederman ha scritto: >> The most significant aspect of Capsicum is associating *rights* with >> (some) file descriptors, so that the kernel only allows operations on an >> FD if the rights permit it. This allows userspace applications to >> sandbox themselves by tightly constraining what's allowed with both >> input and outputs; for example, tcpdump might restrict itself so it can >> only read from the network FD, and only write to stdout. >> >> The kernel thus needs to police the rights checks for these file >> descriptors (referred to as 'Capsicum capabilities', completely >> different than POSIX.1e capabilities), and the best place to do this is >> at the points where a file descriptor from userspace is converted to a >> struct file * within the kernel. >> >> [Policing the rights checks anywhere else, for example at the system >> call boundary, isn't a good idea because it opens up the possibility >> of time-of-check/time-of-use (TOCTOU) attacks [2] where FDs are >> changed (as openat/close/dup2 are allowed in capability mode) between >> the 'check' at syscall entry and the 'use' at fget() invocation.] >> >> However, this does lead to quite an invasive change to the kernel -- >> every invocation of fget() or similar functions (fdget(), >> sockfd_lookup(), user_path_at(),...) needs to be annotated with the >> rights associated with the specific operations that will be performed on >> the struct file. There are ~100 such invocations that need >> annotation. > > And it is silly. Roughly you just need a locking version of > fcntl(F_SETFL). > > That is make the restriction in the struct file not in the fd to file > lookup. No, they have to be in the file descriptor. The same file descriptor can be dup'ed and passed with different capabilities to different processes. Say you pass an eventfd to a process with SCM_RIGHTS, and you want to only allow the process to write to it. >> 4) New System Calls >> ------------------- >> >> To allow userspace applications to access the Capsicum capability >> functionality, I'm proposing two new system calls: cap_rights_limit(2) >> and cap_rights_get(2). I guess these could potentially be implemented >> elsewhere (e.g. as fcntl(2) operations?) but the changes seem >> significant enough that new syscalls are warranted. >> >> [FreeBSD 10.x actually includes six new syscalls for manipulating the >> rights associated with a Capsicum capability -- the capability rights >> can police that only specific fcntl(2) or ioctl(2) commands are >> allowed, and FreeBSD sets these with distinct syscalls.] > > ioctls? In a sandbox? Ick. KVM? X11? Both of them use loads of ioctls. I'm less sure of the benefit of picking which fcntls to allow. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html