On Tue, 12 Dec 2023, Kent Overstreet wrote: > On Tue, Dec 12, 2023 at 11:59:51AM +1100, NeilBrown wrote: > > On Tue, 12 Dec 2023, Kent Overstreet wrote: > > > NFSv4 specs that for the maximum size? That is pretty hefty... > > > > It is - but it needs room to identify the filesystem and it needs to be > > stable across time. That need is more than a local filesystem needs. > > > > NFSv2 allowed 32 bytes which is enough for a 16 byte filesys uuid, 8 > > byte inum and 8byte generation num. But only just. > > > > NFSv3 allowed 64 bytes which was likely plenty for (nearly?) every > > situation. > > > > NFSv4 doubled it again because .... who knows. "why not" I guess. > > Linux nfsd typically uses 20 or 28 bytes plus whatever the filesystem > > wants. (28 when the export point is not the root of the filesystem). > > I suspect this always fits within an NFSv3 handle except when > > re-exporting an NFS filesystem. NFS re-export is an interesting case... > > Now I'm really curious - i_generation wasn't enough? Are we including > filesystem UUIDs? i_generation was invented so that it could be inserted into the NFS fileshandle. The NFS filehandle is opaque. It likely contains an inode number, a generation number, and a filesystem identifier. But it is not possible to extract those from the handle. > > I suppose if we want to be able to round trip this stuff we do need to > allocate space for it, even if a local filesystem would never include > it. > > > I suggest: > > > > STATX_ATTR_INUM_NOT_UNIQUE - it is possible that two files have the > > same inode number > > > > > > __u64 stx_vol Volume identifier. Two files with same stx_vol and > > stx_ino MUST be the same. Exact meaning of volumes > > is filesys-specific > > NFS reexport that you mentioned previously makes it seem like this > guarantee is impossible to provide in general (so I'd leave it out > entirely, it's just something for people to trip over). NFS would not set stx_vol and would not return STATX_VOL in stx_mask. So it would not attempt to provide that guarantee. Maybe we don't need to explicitly make this guarantee. > > But we definitely want stx_vol in there. Another thing that people ask > for is a way to ask "is this a subvolume root?" - we should make sure > that's clearly specified, or can we just include a bit for it? The start way to test for a filesystem root - or mount point at least - is to stat the directory in question and its parent (..) and see if the have the same st_dev or not. Applying the same logic to volumes means that a single stx_vol number is sufficient. I'm not strongly against a STATX_ATTR_VOL_ROOT flag providing everyone agrees what it means that we cannot imagine any awkward corner-cases (like a 'root' being different from a 'mount point'). > > > STATX_VOL Want stx_vol > > > > __u8 stx_handle_len Length of stx_handle if present > > __u8 stx_handle[128] Unique stable identifier for this file. Will > > NEVER be reused for a different file. > > This appears AFTER __statx_pad2, beyond > > the current 'struct statx'. > > STATX_HANDLE Want stx_handle_len and stx_handle. Buffer for > > receiving statx info has at least > > sizeof(struct statx)+128 bytes. > > > > I think both the handle and the vol can be useful. > > NFS can provide stx_handle but not stx_vol. It is the thing > > to use for equality testing, but it is only needed if > > STATX_ATTR_INUM_NOT_UNIQUE is set. > > stx_vol is useful for "du -x" or maybe "du --one-volume" or similar. > > > > > > Note that we *could* add stx_vol to NFSv4.2. It is designed for > > incremental extension. I suspect we wouldn't want to rush into this, > > but to wait to see if different volume-capable filesystems have other > > details of volumes that are common and can usefully be exported by statx > > Sounds reasonable > Thanks, NeilBrown