On Fri, Mar 11, 2016 at 9:23 AM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Fri, Mar 11, 2016 at 5:59 AM, One Thousand Gnomes > <gnomes@xxxxxxxxxxxxxxxxxxx> wrote: >> >> > > We can do the security check at the filesystem level, because we have >> > > sb->s_bdev->bd_inode, and if you have read and write permissions to >> > > that inode, you might as well have permission to create a unsafe hole. >> >> Not if you don't have access to a block device node to open it, or there >> are SELinux rules that control the access. There are cases it isn't >> entirely the same thing as far as I can see. Consider within a container >> for example. > > I agree that it's not the same thing, but I don't think it really ends > up mattering. > > Either the container is properly separated and set up - in which case > the uid mapping is what protects you - or it isn't - in which case the > container could just mknod whatever hell node it wants anyway. > > So we do pretty much have the permission model. This makes me nervous. Suppose I unshare my user namespace, set up very restrictive mounts, drop caps, seccomp the hell out of myself (but allow literally only read, write, and ioctl and keep only a single fd to a file on an ordinary filesystem, which should be safe), and run untrusted code. Now that code can do this unsafe ioctl simply because its uid or gid happens to have read access to a device node that isn't even present in the sandbox. Ick. What if we had an ioctl to do these data-leaking operations that took, as an extra parameter, an fd to the block device node. They allow access if the fd points to the right inode and has FMODE_READ (and LSM checks say it's okay). Sure, it's awkward, but it's much safer. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html