On 9/3/2021 8:26 AM, Vivek Goyal wrote: > On Thu, Sep 02, 2021 at 03:34:17PM -0700, Casey Schaufler wrote: >> On 9/2/2021 1:06 PM, Vivek Goyal wrote: >>> On Thu, Sep 02, 2021 at 11:55:11AM -0700, Casey Schaufler wrote: >>>> On 9/2/2021 10:42 AM, Vivek Goyal wrote: >>>>> On Thu, Sep 02, 2021 at 01:05:01PM -0400, Vivek Goyal wrote: >>>>>> On Thu, Sep 02, 2021 at 08:43:50AM -0700, Casey Schaufler wrote: >>>>>>> On 9/2/2021 8:22 AM, Vivek Goyal wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> This is V3 of the patch. Previous versions were posted here. >>>>>>>> >>>>>>>> v2: >>>>>>>> https://lore.kernel.org/linux-fsdevel/20210708175738.360757-1-vgoyal@xxxxxxxxxx/ >>>>>>>> v1: >>>>>>>> https://lore.kernel.org/linux-fsdevel/20210625191229.1752531-1-vgoyal@xxxxxxxxx >>>>>>>> +m/ >>>>>>>> >>>>>>>> Changes since v2 >>>>>>>> ---------------- >>>>>>>> - Do not call inode_permission() for special files as file mode bits >>>>>>>> on these files represent permissions to read/write from/to device >>>>>>>> and not necessarily permission to read/write xattrs. In this case >>>>>>>> now user.* extended xattrs can be read/written on special files >>>>>>>> as long as caller is owner of file or has CAP_FOWNER. >>>>>>>> >>>>>>>> - Fixed "man xattr". Will post a patch in same thread little later. (J. >>>>>>>> Bruce Fields) >>>>>>>> >>>>>>>> - Fixed xfstest 062. Changed it to run only on older kernels where >>>>>>>> user extended xattrs are not allowed on symlinks/special files. Added >>>>>>>> a new replacement test 648 which does exactly what 062. Just that >>>>>>>> it is supposed to run on newer kernels where user extended xattrs >>>>>>>> are allowed on symlinks and special files. Will post patch in >>>>>>>> same thread (Ted Ts'o). >>>>>>>> >>>>>>>> Testing >>>>>>>> ------- >>>>>>>> - Ran xfstest "./check -g auto" with and without patches and did not >>>>>>>> notice any new failures. >>>>>>>> >>>>>>>> - Tested setting "user.*" xattr with ext4/xfs/btrfs/overlay/nfs >>>>>>>> filesystems and it works. >>>>>>>> >>>>>>>> Description >>>>>>>> =========== >>>>>>>> >>>>>>>> Right now we don't allow setting user.* xattrs on symlinks and special >>>>>>>> files at all. Initially I thought that real reason behind this >>>>>>>> restriction is quota limitations but from last conversation it seemed >>>>>>>> that real reason is that permission bits on symlink and special files >>>>>>>> are special and different from regular files and directories, hence >>>>>>>> this restriction is in place. (I tested with xfs user quota enabled and >>>>>>>> quota restrictions kicked in on symlink). >>>>>>>> >>>>>>>> This version of patch allows reading/writing user.* xattr on symlink and >>>>>>>> special files if caller is owner or priviliged (has CAP_FOWNER) w.r.t inode. >>>>>>> This part of your project makes perfect sense. There's no good >>>>>>> security reason that you shouldn't set user.* xattrs on symlinks >>>>>>> and/or special files. >>>>>>> >>>>>>> However, your virtiofs use case is unreasonable. >>>>>> Ok. So we can merge this patch irrespective of the fact whether virtiofs >>>>>> should make use of this mechanism or not, right? >>>> I don't see a security objection. I did see that Andreas Gruenbacher >>>> <agruenba@xxxxxxxxxx> has objections to the behavior. >>>> >>>> >>>>>>>> Who wants to set user.* xattr on symlink/special files >>>>>>>> ----------------------------------------------------- >>>>>>>> I have primarily two users at this point of time. >>>>>>>> >>>>>>>> - virtiofs daemon. >>>>>>>> >>>>>>>> - fuse-overlay. Giuseppe, seems to set user.* xattr attrs on unpriviliged >>>>>>>> fuse-overlay as well and he ran into similar issue. So fuse-overlay >>>>>>>> should benefit from this change as well. >>>>>>>> >>>>>>>> Why virtiofsd wants to set user.* xattr on symlink/special files >>>>>>>> ---------------------------------------------------------------- >>>>>>>> In virtiofs, actual file server is virtiosd daemon running on host. >>>>>>>> There we have a mode where xattrs can be remapped to something else. >>>>>>>> For example security.selinux can be remapped to >>>>>>>> user.virtiofsd.securit.selinux on the host. >>>>>>> As I have stated before, this introduces a breach in security. >>>>>>> It allows an unprivileged process on the host to manipulate the >>>>>>> security state of the guest. This is horribly wrong. It is not >>>>>>> sufficient to claim that the breach requires misconfiguration >>>>>>> to exploit. Don't do this. >>>>>> So couple of things. >>>>>> >>>>>> - Right now whole virtiofs model is relying on the fact that host >>>>>> unpriviliged users don't have access to shared directory. Otherwise >>>>>> guest process can simply drop a setuid root binary in shared directory >>>>>> and unpriviliged process can execute it and take over host system. >>>>>> >>>>>> So if virtiofs makes use of this mechanism, we are well with-in >>>>>> the existing constraints. If users don't follow the constraints, >>>>>> bad things can happen. >>>>>> >>>>>> - I think Smalley provided a solution for your concern in other thread >>>>>> we discussed this issue. >>>>>> >>>>>> https://lore.kernel.org/selinux/CAEjxPJ4411vL3+Ab-J0yrRTmXoEf8pVR3x3CSRgPjfzwiUcDtw@xxxxxxxxxxxxxx/T/#mddea4cec7a68c3ee5e8826d650020361030209d6 >>>>>> >>>>>> >>>>>> "So for example if the host policy says that only virtiofsd can set >>>>>> attributes on those files, then the guest MAC labels along with all >>>>>> the other attributes are protected against tampering by any other >>>>>> process on the host." >>>> You can't count on SELinux policy to address the issue on a >>>> system running Smack. >>>> Or any other user of system.* xattrs, >>>> be they in the kernel or user space. You can't even count on >>>> SELinux policy to be correct. virtiofs has to present a "safe" >>>> situation regardless of how security.* xattrs are used and >>>> regardless of which, if any, LSMs are configured. You can't >>>> do that with user.* attributes. >>> Lets take a step back. Your primary concern with using user.* xattrs >>> by virtiofsd is that it can be modified by unprivileged users on host. >>> And our solution to that problem is hide shared directory from >>> unprivileged users. >> You really don't see how fragile that is, do you? > Yes, I am not sure why we are so opposed to the idea of using > user.* xattrs for storing the guest security.selinux xattrs by virtiofsd. > And I am trying to understand that. And this discussion should help. > > With virtiofsd, we want to keep all shared directory trees in a > parent directory which has read/write/search permissions only for root > so that no unpriviliged process can get to files in shared directory > and do any of the operations. > > For example, /var/lib/containers/storage setup by podman has > following permissions. > > ll -d /var/lib/containers/storage/ > drwx------. 10 root root 4096 Jun 18 2020 /var/lib/containers/storage/ > > Now I should be able to create /var/lib/containers/storage/shared1 > directory and ask virtiofsd to export "share1" to guest. Unprivileged > process on host can not open any of the files in shared1 dir, hence > should not be able to modify any data/metadata associated with the > file. > > If this assumption is correct, then I should be able to use "user.*" > xattrs without having to worry about unprivileged processes modifying > security labels of guest/nested-guest. The only attributes you can really count on to protect an object are the attributes on the object itself. Path based protections are not reliable. >> How a single >> errant call to rename(), chmod() or chown() on the host can expose >> the entire guest to exploitation. That's not even getting into >> the bag of mount() tricks. > I am relying on unpriviliged processes not having permissions to > read/search in shared directory. And I guess I have to. Even if > I use "trusted.*" xattrs, what about file data. Unpriviliged > processes can modify file data and that's going to be equally > problematic, isn't it? So why are we so focussed only on > security label part of it. We are concentrating on your proposed clever mapping trick. We are doing so because it won't work, and when it blows up into a full security bruhaha someone is going to try putting the blame on the xattr mechanism. > > You have mentioned that file data is not necessarily > that big a problem because UID 1000 on host and UID 1000 in guest > should be treated same. But I am not sure we can do that. In kata > container model, guest images are untrusted. So they can simply cook > up UID 1000 or UID 0 or any other UID. Now there can be use > cases where we are ready to trust guest and treat UID 1000 > on host and guest in same way. But primary model I am focussing > on is that guest shared directory remains isolated from other > users on host. I don't have the time just now to examine the rest of virtiofs. I am quite afraid that you are making dangerous assumptions on a number of things. It sure doesn't sound like you've thought through all of the implications of sharing between the host and guest. >>> In addition to that, LSMs on host can block setting "user.*" xattrs by >>> virtiofsd domain only for additional protection. >> Try thinking outside the SELinux box briefly, if you possibly can. >> An LSM that implements just Bell & LaPadula isn't going to have a >> "virtiofs domain". Neither is a Smack "3 domain" system. Smack doesn't >> distinguish writing user xattrs from writing other file attributes >> in policy. Your argument requires a fine grained policy a'la SELinux. >> And an application specific SELinux policy at that. > Ok, so does we have to have capability for every LSM to block write > to user xattr. I mean in above example, virtiofsd is relying on DAC so that > unprivileged processes can't modify user xattr labels. If we were to > use "trusted.*" xattr then we are relying on CAP_SYS_ADMIN. > > IOW, it will be nice if one or more LSMs can provide mechanism fine > grained enough to block write to user xattr by unwanted application > and that provides extra level of security. But should it be mandatory? No. The behavior of user xattrs is just fine the way it is. It works as designed. > >>> If LSMs are not configured, >>> then hiding the directory is the solution. >> It's not a solution at all. It's wishful thinking that >> some admin is going to do absolutely everything right, will >> never make a mistake and will never, ever, read the mount(2) >> man page. > I agree that its easy to make mistakes. But making mistakes is already > disastrous. So lets say and admin makes mistake and unprivileged > processes can open/read/write files in shared directory on host. > Assume I am using "trusted" xattrs to store guest's security labels. > > Now file's data can be modified on host in unwated way and be very > problematic. > > Guest can drop a setuid root binary in shared directory and unprivileged > process on host can run it and take over the system. > > Isn't that even bigger problem. So to me making sure shared directory > is not reachable by unprivileged processes is absolute must requirement > for virtiofsd (when running as root). If we break that assumption, > we already have much bigger problems. I can't help but think you probably do already have much bigger problems. >>> So why that's not a solution and only relying on CAP_SYS_ADMIN is the >>> solution. I don't understand that part. >> It comes back to your design, which is fundamentally flawed. You >> can't store system security information in an attribute that can >> be manipulated by untrusted entities. > You are assuming untrusted entities can have access to the shared dir. But > assumption in this model is, shared directory is not reachable by > unprivileged entities. If we break that requirement, there are much > bigger issues to deal with then just security attributes. That's a bad assumption. You have all sorts of issues. > >> That's why we have system.* >> xattrs. You want to have an attribute on the host that maps to a >> security attribute on the guest. The host has to protect the attribute >> on the guest with mechanisms of comparable strength as the guest's >> mechanisms. Otherwise you can't trust the guest with host data. > Ok, I understand your desire that security xattrs as seen by guest kernel > should be protected by something stronger than simply user xattr. > >> It's a real shame that CAP_SYS_ADMIN is so scary. The capability >> mechanism as implemented today won't scale to the hundreds of individual >> capabilities it would need to break CAP_SYS_ADMIN up. Maybe someday. >> I'm not convinced that there isn't a way to accomplish what you're >> trying to do without privilege, but this isn't it, and I don't know >> what is. Sorry. >> >>> Also if directory is not hidden, unprivileged users can change file >>> data and other metadata. >> I assumed that you've taken that into account. Are you saying that >> isn't going to be done correctly either? > I am relying on shared directory not accessible to unprivileged processes. > If that assumption is broken, I guess, all bets are off. Until and unless > one designs SELinux (or other LSM) policy in such a way so that > only virtiofsd can read/write to these shared directories. > > I think Dan Walsh has got SELinux policy written for atleast kata > containers case. > > So I guess we will rely on MAC to block unwanted access if users > make a mistake? Is that the idea? I guess that's why you want > a stronger mechansim to store guest security xattrs on host. If > users make a mistake then we have a fallback path? > >>> Why that's not a concern and why there is >>> so much of focus only security xattr. >> As with an NFS mount, the assumption is that UID 567 (or its magically >> mapped equivalent) has the same access rights on both the server/host >> and client/guest. I'm not worried about the mode bits because they are >> presented consistently on both machines. If, on the other hand, an >> attribute used to determine access is security.esprit on the guest and >> user.security.esprit on the host, the unprivileged user on the host >> can defeat the privilege requirements on the guest. That's why. > Hmm..., so if a user id 1000 inside guest can't modify security > xattrs then same user on host should be allowed to bypass that > by modifying user xattr. I agree with that. > > Just that I am relying on shared directory not being accessible > to uid 1000 on host and you don't want to rely on that because > users can easily make mistake. And that's why want to stronger > form of mechanism to store security xattrs. > >>> If you were to block modification >>> of file then you will have rely on LSMs. >> No. We're talking about the semantics of the xattr namespaces. >> LSMs can further constrain access to xattrs, but the basic rules >> of access to the user.* and security.* attributes are different >> in any case. This is by design. >> >>> And if LSMs are not configured, >>> then we will rely on shared directory not being visible. >> LSMs are not the problem. LSMs use security.* xattrs, which is why >> they come up in the discussion. >> >>> Can you please help me understand why hiding shared directory from >>> unprivileged users is not a solution >> Maybe you can describe the mechanism you use to "hide" a shared directory >> on the host. If the filesystem is mounted on the host it seems unlikely >> that you can provide a convincing argument for sufficient protection. > I am relying on changing direcotry permissions to allow read/write/search > permission to root only. Then you have a real mess of problems coming your way. Sorry. > > Thanks > Vivek >