On 9/2/2021 10:42 AM, Vivek Goyal wrote: > On Thu, Sep 02, 2021 at 01:05:01PM -0400, Vivek Goyal wrote: >> On Thu, Sep 02, 2021 at 08:43:50AM -0700, Casey Schaufler wrote: >>> On 9/2/2021 8:22 AM, Vivek Goyal wrote: >>>> Hi, >>>> >>>> This is V3 of the patch. Previous versions were posted here. >>>> >>>> v2: >>>> https://lore.kernel.org/linux-fsdevel/20210708175738.360757-1-vgoyal@xxxxxxxxxx/ >>>> v1: >>>> https://lore.kernel.org/linux-fsdevel/20210625191229.1752531-1-vgoyal@xxxxxxxxx >>>> +m/ >>>> >>>> Changes since v2 >>>> ---------------- >>>> - Do not call inode_permission() for special files as file mode bits >>>> on these files represent permissions to read/write from/to device >>>> and not necessarily permission to read/write xattrs. In this case >>>> now user.* extended xattrs can be read/written on special files >>>> as long as caller is owner of file or has CAP_FOWNER. >>>> >>>> - Fixed "man xattr". Will post a patch in same thread little later. (J. >>>> Bruce Fields) >>>> >>>> - Fixed xfstest 062. Changed it to run only on older kernels where >>>> user extended xattrs are not allowed on symlinks/special files. Added >>>> a new replacement test 648 which does exactly what 062. Just that >>>> it is supposed to run on newer kernels where user extended xattrs >>>> are allowed on symlinks and special files. Will post patch in >>>> same thread (Ted Ts'o). >>>> >>>> Testing >>>> ------- >>>> - Ran xfstest "./check -g auto" with and without patches and did not >>>> notice any new failures. >>>> >>>> - Tested setting "user.*" xattr with ext4/xfs/btrfs/overlay/nfs >>>> filesystems and it works. >>>> >>>> Description >>>> =========== >>>> >>>> Right now we don't allow setting user.* xattrs on symlinks and special >>>> files at all. Initially I thought that real reason behind this >>>> restriction is quota limitations but from last conversation it seemed >>>> that real reason is that permission bits on symlink and special files >>>> are special and different from regular files and directories, hence >>>> this restriction is in place. (I tested with xfs user quota enabled and >>>> quota restrictions kicked in on symlink). >>>> >>>> This version of patch allows reading/writing user.* xattr on symlink and >>>> special files if caller is owner or priviliged (has CAP_FOWNER) w.r.t inode. >>> This part of your project makes perfect sense. There's no good >>> security reason that you shouldn't set user.* xattrs on symlinks >>> and/or special files. >>> >>> However, your virtiofs use case is unreasonable. >> Ok. So we can merge this patch irrespective of the fact whether virtiofs >> should make use of this mechanism or not, right? I don't see a security objection. I did see that Andreas Gruenbacher <agruenba@xxxxxxxxxx> has objections to the behavior. >>>> Who wants to set user.* xattr on symlink/special files >>>> ----------------------------------------------------- >>>> I have primarily two users at this point of time. >>>> >>>> - virtiofs daemon. >>>> >>>> - fuse-overlay. Giuseppe, seems to set user.* xattr attrs on unpriviliged >>>> fuse-overlay as well and he ran into similar issue. So fuse-overlay >>>> should benefit from this change as well. >>>> >>>> Why virtiofsd wants to set user.* xattr on symlink/special files >>>> ---------------------------------------------------------------- >>>> In virtiofs, actual file server is virtiosd daemon running on host. >>>> There we have a mode where xattrs can be remapped to something else. >>>> For example security.selinux can be remapped to >>>> user.virtiofsd.securit.selinux on the host. >>> As I have stated before, this introduces a breach in security. >>> It allows an unprivileged process on the host to manipulate the >>> security state of the guest. This is horribly wrong. It is not >>> sufficient to claim that the breach requires misconfiguration >>> to exploit. Don't do this. >> So couple of things. >> >> - Right now whole virtiofs model is relying on the fact that host >> unpriviliged users don't have access to shared directory. Otherwise >> guest process can simply drop a setuid root binary in shared directory >> and unpriviliged process can execute it and take over host system. >> >> So if virtiofs makes use of this mechanism, we are well with-in >> the existing constraints. If users don't follow the constraints, >> bad things can happen. >> >> - I think Smalley provided a solution for your concern in other thread >> we discussed this issue. >> >> https://lore.kernel.org/selinux/CAEjxPJ4411vL3+Ab-J0yrRTmXoEf8pVR3x3CSRgPjfzwiUcDtw@xxxxxxxxxxxxxx/T/#mddea4cec7a68c3ee5e8826d650020361030209d6 >> >> >> "So for example if the host policy says that only virtiofsd can set >> attributes on those files, then the guest MAC labels along with all >> the other attributes are protected against tampering by any other >> process on the host." You can't count on SELinux policy to address the issue on a system running Smack. Or any other user of system.* xattrs, be they in the kernel or user space. You can't even count on SELinux policy to be correct. virtiofs has to present a "safe" situation regardless of how security.* xattrs are used and regardless of which, if any, LSMs are configured. You can't do that with user.* attributes. >> >> Apart from hiding the shared directory from unpriviliged processes, >> we could design selinux policy in such a way that only virtiofsd >> is allowed "setattr". That should make sure even in case of >> misconfiguration, unprivileged process is not able to change >> guest security xattrs stored in "user.virtiofs.security.selinux". >> >> I think that sounds like a very reasonable proposition. >> >>>> This remapping is useful when SELinux is enabled in guest and virtiofs >>>> as being used as rootfs. Guest and host SELinux policy might not match >>>> and host policy might deny security.selinux xattr setting by guest >>>> onto host. Or host might have SELinux disabled and in that case to >>>> be able to set security.selinux xattr, virtiofsd will need to have >>>> CAP_SYS_ADMIN (which we are trying to avoid). >>> Adding this mapping to virtiofs provides the breach for any >>> LSM using xattrs. >> I think both the points above answer this as well. >> >>>> Being able to remap >>>> guest security.selinux (or other xattrs) on host to something else >>>> is also better from security point of view. >>>> >>>> But when we try this, we noticed that SELinux relabeling in guest >>>> is failing on some symlinks. When I debugged a little more, I >>>> came to know that "user.*" xattrs are not allowed on symlinks >>>> or special files. >>>> >>>> So if we allow owner (or CAP_FOWNER) to set user.* xattr, it will >>>> allow virtiofs to arbitrarily remap guests's xattrs to something >>>> else on host and that solves this SELinux issue nicely and provides >>>> two SELinux policies (host and guest) to co-exist nicely without >>>> interfering with each other. >>> virtiofs could use security.* or system.* xattrs instead of user.* >>> xattrs. Don't use user.* xattrs. >> So requirement is that every layer (host, guest and nested guest), will >> have a separate security.selinux label and virtiofsd should be able >> to retrieve/set any of the labels depending on access. >> >> How do we achieve that with single security.selinux label per inode on host. > I guess we could think of using trusted.* but that requires CAP_SYS_ADMIN > in init_user_ns. And we wanted to retain capability to run virtiofsd > inside user namespace too. Also we wanted to give minimum required > capabilities to virtiofsd to reduce the risk what if virtiofsd gets > compromised. So amount of damage it can do to system is minimum. > > So guest security.selinux xattr could potentially be mapped to. > > trusted.virtiofs.security.selinux > > nested guest selinux xattrs could be mapped to. > > trusted.virtiofs.trusted.virtiofs.security.selinux > > And given reading/setting trusted.* requires CAP_SYS_ADMIN, that means > unpriviliged processes can't change these security attributes of a > file. > > And trade-off is that virtiofsd process needs to be given CAP_SYS_ADMIN. > > Frankly speaking, we are more concerned about the security of host > system (as opposed to something changing in file for guest). So while > using "trusted.*" is still an option, I would think that not running > virtiofsd with CAP_SYS_ADMIN probably has its advantages too. On host > if we can just hide the shared dir from unpriviliged processes then > we get best of both the worlds. Unpriviliged processes can't change > anything on the shared file at the same time, possible damage by > virtiofsd is less if it gets compromised. > > Thanks > Vivek >