> On Thu, 24 Jun 2021, J. Bruce Fields wrote: > > On Thu, Jun 24, 2021 at 08:04:57AM +1000, NeilBrown wrote: > > > On Thu, 24 Jun 2021, J. Bruce Fields wrote: > > > > One other thing I'm not sure about: how do cold cache lookups of > > filehandles for (possibly not-yet-mounted) subvolumes work? > > Ahhhh... that's a good point. Filehandle lookup depends on the target > filesystem being mounted. NFS exporting filesystems which are auto-mounted > on demand would be ... interesting. > > That argues in favour of nfsd treating a btrfs filesystem as a single filesystem > and gaining some knowledge about different subvolumes within a filesystem. > > This has implications for NFS re-export. If a filehandle is received for an NFS > filesystem that needs to be automounted, I expect it would fail. > > Or do we want to introduce a third level in the filehandle: filesystem, subvol, > inode. So just the "filesystem" is used to look things up in /proc/mounts, but > "filesystem+subvol" is used to determine the fsid. > > Maybe another way to state this is that the filesystem could identify a number of > bytes from the fs-local part of the filehandle that should be mixed in to the fsid. > That might be a reasonably clean interface. Hmm, and interesting problem I hadn't considered for nfs-ganesha. Ganesha can handle a lookup into a filesystem (we treat subvols as filesystems) that was not mounted when we started (when we startup we scan mnttab and the btrfs subvol list and add any filesystems belonging to the configured exports) by re-scanning mnttab and the btrfs subvol list. But what if Ganesha restarted, and then after that, a filesystem that a client had a handle for was not mounted at restart time, but is mounted by the time the client tries to use the handle... That would be easy for us to fix, if a handle specifies an unknown fsid, trigger a filesystem rescan. > > > All we really need is: > > > 1/ someone to write the code > > > 2/ someone to review the code > > > 3/ someone to accept the code > > > > Hah. Still, the special exceptions for btrfs seem to be accumulating. > > I wonder if that's happening outside nfs as well. > > I have some colleagues who work on btrfs and based on my occasional > discussions, I think that: yes, btrfs is a bit "special". There are a number of > corner-cases where it doesn't quite behave how one would hope. > This is probably inevitable given they way it is pushing the boundaries of > functionality. It can be a challenge to determine if that "hope" is actually > reasonable, and to figure out a good solution that meets the need cleanly > without imposing performance burdens elsewhere. What other special cases does btrfs have that cause nfs servers pain? I know their handle is big but the only special case code nfs-ganesha has at the moment is listing the subvols as part of the filesystem scan. Frank