On Thu, Sep 02, 2021 at 01:05:01PM -0400, Vivek Goyal wrote: > On Thu, Sep 02, 2021 at 08:43:50AM -0700, Casey Schaufler wrote: > > On 9/2/2021 8:22 AM, Vivek Goyal wrote: > > > Hi, > > > > > > This is V3 of the patch. Previous versions were posted here. > > > > > > v2: > > > https://lore.kernel.org/linux-fsdevel/20210708175738.360757-1-vgoyal@xxxxxxxxxx/ > > > v1: > > > https://lore.kernel.org/linux-fsdevel/20210625191229.1752531-1-vgoyal@xxxxxxxxx > > > +m/ > > > > > > Changes since v2 > > > ---------------- > > > - Do not call inode_permission() for special files as file mode bits > > > on these files represent permissions to read/write from/to device > > > and not necessarily permission to read/write xattrs. In this case > > > now user.* extended xattrs can be read/written on special files > > > as long as caller is owner of file or has CAP_FOWNER. > > > > > > - Fixed "man xattr". Will post a patch in same thread little later. (J. > > > Bruce Fields) > > > > > > - Fixed xfstest 062. Changed it to run only on older kernels where > > > user extended xattrs are not allowed on symlinks/special files. Added > > > a new replacement test 648 which does exactly what 062. Just that > > > it is supposed to run on newer kernels where user extended xattrs > > > are allowed on symlinks and special files. Will post patch in > > > same thread (Ted Ts'o). > > > > > > Testing > > > ------- > > > - Ran xfstest "./check -g auto" with and without patches and did not > > > notice any new failures. > > > > > > - Tested setting "user.*" xattr with ext4/xfs/btrfs/overlay/nfs > > > filesystems and it works. > > > > > > Description > > > =========== > > > > > > Right now we don't allow setting user.* xattrs on symlinks and special > > > files at all. Initially I thought that real reason behind this > > > restriction is quota limitations but from last conversation it seemed > > > that real reason is that permission bits on symlink and special files > > > are special and different from regular files and directories, hence > > > this restriction is in place. (I tested with xfs user quota enabled and > > > quota restrictions kicked in on symlink). > > > > > > This version of patch allows reading/writing user.* xattr on symlink and > > > special files if caller is owner or priviliged (has CAP_FOWNER) w.r.t inode. > > > > This part of your project makes perfect sense. There's no good > > security reason that you shouldn't set user.* xattrs on symlinks > > and/or special files. > > > > However, your virtiofs use case is unreasonable. > > Ok. So we can merge this patch irrespective of the fact whether virtiofs > should make use of this mechanism or not, right? > > > > > > Who wants to set user.* xattr on symlink/special files > > > ----------------------------------------------------- > > > I have primarily two users at this point of time. > > > > > > - virtiofs daemon. > > > > > > - fuse-overlay. Giuseppe, seems to set user.* xattr attrs on unpriviliged > > > fuse-overlay as well and he ran into similar issue. So fuse-overlay > > > should benefit from this change as well. > > > > > > Why virtiofsd wants to set user.* xattr on symlink/special files > > > ---------------------------------------------------------------- > > > In virtiofs, actual file server is virtiosd daemon running on host. > > > There we have a mode where xattrs can be remapped to something else. > > > For example security.selinux can be remapped to > > > user.virtiofsd.securit.selinux on the host. > > > > As I have stated before, this introduces a breach in security. > > It allows an unprivileged process on the host to manipulate the > > security state of the guest. This is horribly wrong. It is not > > sufficient to claim that the breach requires misconfiguration > > to exploit. Don't do this. > > So couple of things. > > - Right now whole virtiofs model is relying on the fact that host > unpriviliged users don't have access to shared directory. Otherwise > guest process can simply drop a setuid root binary in shared directory > and unpriviliged process can execute it and take over host system. > > So if virtiofs makes use of this mechanism, we are well with-in > the existing constraints. If users don't follow the constraints, > bad things can happen. > > - I think Smalley provided a solution for your concern in other thread > we discussed this issue. > > https://lore.kernel.org/selinux/CAEjxPJ4411vL3+Ab-J0yrRTmXoEf8pVR3x3CSRgPjfzwiUcDtw@xxxxxxxxxxxxxx/T/#mddea4cec7a68c3ee5e8826d650020361030209d6 > > > "So for example if the host policy says that only virtiofsd can set > attributes on those files, then the guest MAC labels along with all > the other attributes are protected against tampering by any other > process on the host." > > Apart from hiding the shared directory from unpriviliged processes, > we could design selinux policy in such a way that only virtiofsd > is allowed "setattr". That should make sure even in case of > misconfiguration, unprivileged process is not able to change > guest security xattrs stored in "user.virtiofs.security.selinux". > > I think that sounds like a very reasonable proposition. > > > > > > This remapping is useful when SELinux is enabled in guest and virtiofs > > > as being used as rootfs. Guest and host SELinux policy might not match > > > and host policy might deny security.selinux xattr setting by guest > > > onto host. Or host might have SELinux disabled and in that case to > > > be able to set security.selinux xattr, virtiofsd will need to have > > > CAP_SYS_ADMIN (which we are trying to avoid). > > > > Adding this mapping to virtiofs provides the breach for any > > LSM using xattrs. > > I think both the points above answer this as well. > > > > > > Being able to remap > > > guest security.selinux (or other xattrs) on host to something else > > > is also better from security point of view. > > > > > > But when we try this, we noticed that SELinux relabeling in guest > > > is failing on some symlinks. When I debugged a little more, I > > > came to know that "user.*" xattrs are not allowed on symlinks > > > or special files. > > > > > > So if we allow owner (or CAP_FOWNER) to set user.* xattr, it will > > > allow virtiofs to arbitrarily remap guests's xattrs to something > > > else on host and that solves this SELinux issue nicely and provides > > > two SELinux policies (host and guest) to co-exist nicely without > > > interfering with each other. > > > > virtiofs could use security.* or system.* xattrs instead of user.* > > xattrs. Don't use user.* xattrs. > > So requirement is that every layer (host, guest and nested guest), will > have a separate security.selinux label and virtiofsd should be able > to retrieve/set any of the labels depending on access. > > How do we achieve that with single security.selinux label per inode on host. I guess we could think of using trusted.* but that requires CAP_SYS_ADMIN in init_user_ns. And we wanted to retain capability to run virtiofsd inside user namespace too. Also we wanted to give minimum required capabilities to virtiofsd to reduce the risk what if virtiofsd gets compromised. So amount of damage it can do to system is minimum. So guest security.selinux xattr could potentially be mapped to. trusted.virtiofs.security.selinux nested guest selinux xattrs could be mapped to. trusted.virtiofs.trusted.virtiofs.security.selinux And given reading/setting trusted.* requires CAP_SYS_ADMIN, that means unpriviliged processes can't change these security attributes of a file. And trade-off is that virtiofsd process needs to be given CAP_SYS_ADMIN. Frankly speaking, we are more concerned about the security of host system (as opposed to something changing in file for guest). So while using "trusted.*" is still an option, I would think that not running virtiofsd with CAP_SYS_ADMIN probably has its advantages too. On host if we can just hide the shared dir from unpriviliged processes then we get best of both the worlds. Unpriviliged processes can't change anything on the shared file at the same time, possible damage by virtiofsd is less if it gets compromised. Thanks Vivek