On Sun, Sep 30, 2018 at 03:16:52PM +0100, Alan Cox wrote: > > > CAP_SYS_ADMIN is also a bit weird because low level access usually > > > implies you can bypass access controls so you should also check > > > CAP_SYS_DAC ? > > > > Do you mean CAP_DAC_READ_SEARCH as per the newer handle syscalls? > > But that only allows bypassing directory search operations, so maybe > > you mean CAP_DAC_OVERRIDE? > > It depends what the ioctl allows you to do. If it allows me to bypass > DAC and manipulate the file system to move objects around then it's a > serious issue. These interfaces have always been allowed to do that. You can't do transparent online background defragmentation without bypassing DAC and moving objects around. You can't scrub metadata and data without bypassing DAC. You can't do dedupe without bypassing /some level/ of DAC to get access to the filesystem used space map and the raw block device to hash the data. But the really important access control for dedupe - avoiding deduping data across files at different security levels - isn't controlled at all. > The underlying problem is if CAP_SYS_ADMIN is able to move objects around > then I can move modules around. Yup, anything with direct access to block devices can do that. Many filesystem and storage utilities are given direct access to the block device, because that's what they need to work. e.g. in DM land, the control ioctls (ctl_ioctl()) are protected by: /* only root can play with this */ if (!capable(CAP_SYS_ADMIN)) return -EACCES; Think about it - if DM control ioctls only require CAP_SYS_ADMIN, then if have that cap you can use DM to remap any block in a block device to any other block. You don't need to the filesystem to move stuff around, it can be moved around without the filesystem knowing anything about it. > We already have a problem with > CAP_DAC_OVERRIDE giving you CAP_SYS_RAWIO (ie totally owning the machine) > unless the modules are signed, if xfs allows ADMIN as well then > CAP_SYS_ADMIN is much easier to obtain and you'd get total system > ownership from it. Always been the case, and it's not isolated to XFS. $ git grep CAP_SYS_ADMIN fs/ |wc -l 139 $ git grep CAP_SYS_ADMIN block/ |wc -l 16 $ git grep CAP_SYS_ADMIN drivers/block/ drivers/scsi |wc -l 88 The "CAP_SYS_ADMIN for ioctls" trust model in the storage stack extends both above and below the filesystem. If you don't trust CAP_SYS_ADMIN, then you are basically saying that you cannot trust your storage management and maintenance utilities at any level. > Not good. > > > Regardless, this horse bolted long before those syscalls were > > introduced. The time to address this issue was when XFS was merged > > into linux all those years ago, back when the apps that run in > > highly secure restricted environments that use these interfaces were > > being ported to linux. We can't change this now without breaking > > userspace.... > > That's what people said about setuid shell scripts. Completely different. setuid shell scripts got abused as a hack for the lazy to avoid setting up permissions properly and hence were easily exploited. The storage stack is completely dependent on a simplisitic layered trust model and that root (CAP_SYS_ADMIN) is god. The storage trust model falls completely apart if we don't have a trusted root user to administer all layers of the storage stack. This isn't the first time I've raised this issue - I raised it back when the user namespace stuff was ram-roaded into the kernel, and was essentially ignored by the userns people. As a result, we end up with all the storage management ioctls restricted to the initns where we have trusted CAP_SYS_ADMIN users. I've also raised it more recently in the unprivileged mount discussions (so untrusted root in containers can mount filesystems) - no solution to the underlying trust model deficiencies was found in those discussions, either. Instead, filesystems that can be mounted by untrusted users (i.e. FUSE) have a special flag in their fstype definition to say this is allowed. Systems restricted by LSMs to the point where CAP_SYS_ADMIN is not trusted have exactly the same issues. i.e. there's nobody trusted by the kernel to administer the storage stack, and nobody has defined a workable security model that can prevent untrusted users from violating the existing storage trust model.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx