On Tue, May 10, 2022 at 02:45:39PM +0200, Florian Weimer wrote: > * Dave Chinner: > > > IOWs, what Linux really needs is a listxattr2() syscall that works > > the same way that getdents/XFS_IOC_ATTRLIST_BY_HANDLE work. With the > > list function returning value sizes and being able to iterate > > effectively, every problem that listxattr() causes goes away. > > getdents has issues of its own because it's unspecified what happens if > the list of entries is modified during iteration. Few file systems add > another tree just to guarantee stable iteration. The filesystem I care about (XFS) guarantees stable iteration and stable seekdir/telldir cookies. It's not that hard to do, but it requires the filesystem designer to understand that this is a necessary feature before they start designing the on-disk directory format and lookup algorithms.... > Maybe that's different for xattrs because they are supposed to be small > and can just be snapshotted with a full copy? It's different for xattrs because we directly control the API specification for XFS_IOC_ATTRLIST_BY_HANDLE, not POSIX. We can define the behaviour however we want. Stable iteration is what listing keys needs. The cursor is defined as 16 bytes of opaque data, enabling us to encoded exactly where in the hashed name btree index we have traversed to: /* * Kernel-internal version of the attrlist cursor. */ struct xfs_attrlist_cursor_kern { __u32 hashval; /* hash value of next entry to add */ __u32 blkno; /* block containing entry (suggestion) */ __u32 offset; /* offset in list of equal-hashvals */ __u16 pad1; /* padding to match user-level */ __u8 pad2; /* padding to match user-level */ __u8 initted; /* T/F: cursor has been initialized */ }; Hence we have all the information in the cursor we need to reset the btree traversal index to the exact entry we finished at (even in the presence of hash collisions in the index). Hence removal of the entry the cursor points to isn't a problem for us, we just move to the next highest sequential hash index in the btree and start again from there. Of course, if this is how we define listxattr2() behaviour (or maybe we should call it "list_keys()" to make it clear we are treating this as a key/value store instead of xattrs) then each filesystem can put what it needs in that cursor to guarantee it can restart key iteration correctly if the entry the cursor points to has been removed. We can also make the cursor larger if necessary for other filesystems to store the information they need. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx