On Tue, Dec 12, 2023 at 01:13:07PM +1100, NeilBrown wrote: > On Tue, 12 Dec 2023, Kent Overstreet wrote: > > On Tue, Dec 12, 2023 at 11:59:51AM +1100, NeilBrown wrote: > > > On Tue, 12 Dec 2023, Kent Overstreet wrote: > > > > NFSv4 specs that for the maximum size? That is pretty hefty... > > > > > > It is - but it needs room to identify the filesystem and it needs to be > > > stable across time. That need is more than a local filesystem needs. > > > > > > NFSv2 allowed 32 bytes which is enough for a 16 byte filesys uuid, 8 > > > byte inum and 8byte generation num. But only just. > > > > > > NFSv3 allowed 64 bytes which was likely plenty for (nearly?) every > > > situation. > > > > > > NFSv4 doubled it again because .... who knows. "why not" I guess. > > > Linux nfsd typically uses 20 or 28 bytes plus whatever the filesystem > > > wants. (28 when the export point is not the root of the filesystem). > > > I suspect this always fits within an NFSv3 handle except when > > > re-exporting an NFS filesystem. NFS re-export is an interesting case... > > > > Now I'm really curious - i_generation wasn't enough? Are we including > > filesystem UUIDs? > > i_generation was invented so that it could be inserted into the NFS > fileshandle. > > The NFS filehandle is opaque. It likely contains an inode number, a > generation number, and a filesystem identifier. But it is not possible > to extract those from the handle. > > > > > I suppose if we want to be able to round trip this stuff we do need to > > allocate space for it, even if a local filesystem would never include > > it. > > > > > I suggest: > > > > > > STATX_ATTR_INUM_NOT_UNIQUE - it is possible that two files have the > > > same inode number > > > > > > > > > __u64 stx_vol Volume identifier. Two files with same stx_vol and > > > stx_ino MUST be the same. Exact meaning of volumes > > > is filesys-specific > > > > NFS reexport that you mentioned previously makes it seem like this > > guarantee is impossible to provide in general (so I'd leave it out > > entirely, it's just something for people to trip over). > > NFS would not set stx_vol and would not return STATX_VOL in stx_mask. > So it would not attempt to provide that guarantee. > > Maybe we don't need to explicitly make this guarantee. > > > > > But we definitely want stx_vol in there. Another thing that people ask > > for is a way to ask "is this a subvolume root?" - we should make sure > > that's clearly specified, or can we just include a bit for it? > > The start way to test for a filesystem root - or mount point at least - > is to stat the directory in question and its parent (..) and see if the > have the same st_dev or not. It depends. If you want to figure out whether it's a different filesystem or a different btrfs subvolume then yes, this generally works because of changing device ids. But it doesn't work for bind-mounts as they don't change device numbers. But maybe you and I are using mount point differently here. > Applying the same logic to volumes means that a single stx_vol number is > sufficient. Yes, that would generally work. > > I'm not strongly against a STATX_ATTR_VOL_ROOT flag providing everyone > agrees what it means that we cannot imagine any awkward corner-cases > (like a 'root' being different from a 'mount point'). I feel like you might have missed my previous mails where I strongly argued for the addition of STATX_ATTR_SUBVOLUME_ROOT: https://lore.kernel.org/linux-btrfs/20231108-herleiten-bezwangen-ffb2821f539e@brauner The concept of a subvolume root and of a mount point should be kept separate. Christoph tried mapping subvolumes to vfsmounts, something that I (and Al) vehemently oppose for various reasons outlined in that and other long threads. I still think that we should start with exposing subvolume id first.