On Mon, Dec 9, 2024 at 10:16 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > On Mon, Dec 09, 2024 at 09:58:58AM +0100, Amir Goldstein wrote: > > On Mon, Dec 9, 2024 at 8:49 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > > > > > On Sat, Dec 07, 2024 at 09:49:02AM +0100, Amir Goldstein wrote: > > > > > /* file handles can be used by a process on another node */ > > > > > #define EXPORT_OP_ALLOW_REMOTE_NODES (...) > > > > > > > > This has a sound of security which is incorrect IMO. > > > > The fact that we block nfsd export of cgroups does not prevent > > > > any type of userland file server from exporting cgroup file handles. > > > > > > So what is the purpose of the flag? Asking for a coherent name and > > > description was the other bigger ask for me. > > > > > > > Maybe opt-out of nfsd export is a little less safer than opt-in, but > > > > 1. opt-out is and will remain the rare exception for export_operations > > > > 2. at least the flag name EXPORT_OP_LOCAL_FILE_HANDLE > > > > is pretty clear IMO > > > > > > Even after this thread I have absolutely no idea what problem it tries > > > to solve. Maybe that's not just the flag names fault, and not of opt-in > > > vs out, but both certainly don't help. > > > > > > > Plus, as I wrote in another email, the fact that pidfs is SB_NOUSER, > > > > so userspace is not allowed to mount it into the namespace and > > > > userland file servers cannot export the filesystem itself. > > > > That property itself (SB_NOUSER), is therefore a good enough indication > > > > to deny nfsd export of this fs. > > > > > > So check SB_NOUSER in nfsd and be done with it? > > > > > > > That will work for the new user (pidfs). > > > > I think SB_KERNMOUNT qualifies as well, because it signifies > > a mount that does not belong to any user's mount namespace. > > > > For example, tmpfs (shmem) can be exported via nfs, but trying to > > export an anonymous memfd should fail. > > > > To be clear, exporting pidfs or internal shmem via an anonymous fd is > > probably not possible with existing userspace tools, but with all the new > > mount_fd and magic link apis, I can never be sure what can be made possible > > to achieve when the user holds an anonymous fd. > > > > The thinking behind adding the EXPORT_OP_LOCAL_FILE_HANDLE flag > > was that when kernfs/cgroups was added exportfs support with commit > > aa8188253474 ("kernfs: add exportfs operations"), there was no intention > > to export cgroupfs over nfs, only local to uses, but that was never enforced, > > so we thought it would be good to add this restriction and backport it to > > stable kernels. > > > > [CC patch authors] > > > > I tried to look for some property of cgroupfs that will make it not eligible > > for nfs such as the SB_KERNMOUNT and SB_NOUSER indicators, but > > as far as I can see cgroups looks like any other non-blockdev filesystem. > > > > Maybe we were wrong about the assumption that cgroupfs should be treated > > specially and deny export cgroups over nfs?? > > Please don't export any of the "fake" kernel filesystems (configfs, > cgroups, sysfs, debugfs, proc, etc) over nfs please. That way lies > madness and makes no sense. Agreed. The problem is that when looking in vfs for a clear definition of "fake" kernel filesystems, I cannot find an objective criteria, especially for those "fake" fs that you listed. The "pseudo" filesystems (that call init_pseudo()) which are clearly marked with SB_NOUSER (except for nsfs) The "internal" filesystems that use kern_mount() internal mount ns are clearly marked with SB_KERNMOUNT. One commonality that I found among most fs that have a known sysfs mount point is that they use get_tree_single() because they are "singleton" filesystems. However, a singleton filesystem may not be fake (efivarfs, openpromfs), so I am not sure this is a good indication for no nfs export and in any case, this it not true for procfs, sysfs, cgroups (kernfs). We are left with the fact that there is no clear criteria with the list of "fake" filesystems that you mentioned and among that list, only cgroupfs has implemented file handles, so I see two options: 1. Explicitly opt-out of nfs export for cgroups as the proposed patch set does 2. Clearly mark all "fake" filesystems, for whatever "fake" means with SB_<WHATEVER> and then let nfsd query the "fake" fs flag #1 is pretty straightforward. #2 means this discussion will go on for some time and that I may eventually need to submit a "What is a fake filesystem?" topic to LSFMM ;) Thanks, Amir.