On Tue, Aug 03, 2021 at 07:59:30AM +1000, NeilBrown wrote: > On Tue, 03 Aug 2021, J. Bruce Fields wrote: > > On Tue, Aug 03, 2021 at 07:10:44AM +1000, NeilBrown wrote: > > > On Mon, 02 Aug 2021, J. Bruce Fields wrote: > > > > On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote: > > > > > For btrfs, the "location" is root.objectid ++ file.objectid. I think > > > > > the inode should become (file.objectid ^ swab64(root.objectid)). This > > > > > will provide numbers that are unique until you get very large subvols, > > > > > and very many subvols. > > > > > > > > If you snapshot a filesystem, I'd expect, at least by default, that > > > > inodes in the snapshot to stay the same as in the snapshotted > > > > filesystem. > > > > > > As I said: we need to challenge and revise user-space (and meat-space) > > > expectations. > > > > The example that came to mind is people that export a snapshot, then > > replace it with an updated snapshot, and expect that to be transparent > > to clients. > > > > Our client will error out with ESTALE if it notices an inode number > > changed out from under it. > > Will it? See fs/nfs/inode.c:nfs_check_inode_attributes(): if (nfsi->fileid != fattr->fileid) { /* Is this perhaps the mounted-on fileid? */ if ((fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) && nfsi->fileid == fattr->mounted_on_fileid) return 0; return -ESTALE; } --b. > If the inode number changed, then the filehandle would change. > Unless the filesystem were exported with subtreecheck, the old filehandle > would continue to work (unless the old snapshot was deleted). File-name > lookups from the root would find new files... > > "replace with an updated snapshot" is no different from "replace with an > updated directory tree". If you delete the old tree, then > currently-open files will break. If you don't you get a reasonably > clean transition. > > > > > I don't know if there are other such cases. It seems like surprising > > behavior to me, though. > > If you refuse to risk breaking anything, then you cannot make progress. > Providing people can choose when things break, and have advanced > warning, they often cope remarkable well. > > Thanks, > NeilBrown > > > > > > --b. > > > > > In btrfs, you DO NOT snapshot a FILESYSTEM. Rather, you effectively > > > create a 'reflink' for a subtree (only works on subtrees that have been > > > correctly created with the poorly named "btrfs subvolume" command). > > > > > > As with any reflink, the original has the same inode number that it did > > > before, the new version has a different inode number (though in current > > > BTRFS, half of the inode number is hidden from user-space, so it looks > > > like the inode number hasn't changed). > > > >