On Tue, 20 Jul 2021, Christoph Hellwig wrote: > On Tue, Jul 20, 2021 at 09:54:44AM +1000, NeilBrown wrote: > > Do you have any pointers to other breakage caused by this particular > > behaviour of btrfs? It would to have all requirements clearly on the > > table while designing a solution. > > A quick google find: > > https://lore.kernel.org/linux-btrfs/b5e7e64a-741c-baee-bc4d-cd51ca9b3a38@xxxxxxxxx/T/ > https://savannah.gnu.org/bugs/?50859 > https://github.com/coreos/bugs/issues/301 > https://bugs.kde.org/show_bug.cgi?id=317127 > https://github.com/borgbackup/borg/issues/4009 > https://bugs.python.org/issue37339 > http://mail.openjdk.java.net/pipermail/nio-dev/2017-June/004292.html > > and that is just the first 2 or three pages of trivial search results. > Thanks a lot for these! Very helpful. The details vary, but the core problem seems to be that the device number found in /proc/self/mountinfo is the same for all mounts from a given btrfs filesystem, no matter which subvol happens to be found at or beneath that mountpoint. So it can even be that 'stat' on a mountpoint returns different numbers to what is found for that mountpoint in /proc/self/mountinfo. To address these issues we would need to: 1/ make every btrfs subvol which is not already a mountpoint into an automount point which mounts the subvol (similar to the use of automount in NFS). 2/ either give each subvol a separate 'struct super_block' (which is apparently a bad idea) or change show_mountinfo() to allow an alternate dev_t to be used. e.g. some new s_op which is given mnt->mnt_root and returns a dev_t. If the new s_op is not available, sb->s_dev is used. For nfsd to be able to work with this, those automount points need to have an inode in the parent filesystem with a distinct inode number, and the mount must be marked in some way that nfsd can tell that it is "internal". Possibly a helper function that tests if mnt_parent has the same mnt.mnt_sb would be sufficient, though it might be nice to export this fact to user-space somehow. Also exportfs_decode_fh() needs to be enhanced, probably to return a 'struct path'. Does anything there seem unreasonable to you? Thanks, NeilBrown