> On May 2, 2023, at 8:12 PM, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Mon, 17 Apr 2023 15:23:10 -0400 Chuck Lever <cel@xxxxxxxxxx> wrote: > >> From: Chuck Lever <chuck.lever@xxxxxxxxxx> >> >> The current cursor-based directory cookie mechanism doesn't work >> when a tmpfs filesystem is exported via NFS. This is because NFS >> clients do not open directories: each READDIR operation has to open >> the directory on the server, read it, then close it. The cursor >> state for that directory, being associated strictly with the opened >> struct file, is then discarded. >> >> Directory cookies are cached not only by NFS clients, but also by >> user space libraries on those clients. Essentially there is no way >> to invalidate those caches when directory offsets have changed on >> an NFS server after the offset-to-dentry mapping changes. >> >> The solution we've come up with is to make the directory cookie for >> each file in a tmpfs filesystem stable for the life of the directory >> entry it represents. >> >> Add a per-directory xarray. shmem_readdir() uses this to map each >> directory offset (an loff_t integer) to the memory address of a >> struct dentry. >> > > How have people survived for this long with this problem? It's less of a problem without NFS in the picture; local applications can hold the directory open, and that preserves the seek cursor. But you can still trigger it. Also, a plurality of applications are well-behaved in this regard. It's just the more complex and more useful ones (like git) that seem to trigger issues. It became less bearable for NFS because of a recent change on the Linux NFS client to optimize directory read behavior: 85aa8ddc3818 ("NFS: Trigger the "ls -l" readdir heuristic sooner") Trond argued that tmpfs directory cookie behavior has always been problematic (eg broken) therefore this commit does not count as a regression. However, it does make tmpfs exports less usable, breaking some tests that have always worked. > It's a lot of new code - I don't feel that this is a lot of new code: include/linux/shmem_fs.h | 2 mm/shmem.c | 213 +++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 201 insertions(+), 14 deletions(-) But I agree it might look a little daunting on first review. I am happy to try to break this single patch up or consider other approaches. We could, for instance, tuck a little more of this into lib/fs. Copying the readdir and directory seeking implementation from simplefs to tmpfs is one reason the insertion count is worrisome. > can we get away with simply disallowing > exports of tmpfs? I think the bottom line is that you /can/ trigger this behavior without NFS, just not as quickly. The threshold is high enough that most use cases aren't bothered by this right now. We'd rather not disallow exporting tmpfs. It's a very good testing platform for us, and disallowing it would be a noticeable regression for some folks. > How can we maintain this? Is it possible to come up with a test > harness for inclusion in kernel selftests? There is very little directory cookie testing that I know of in the obvious place: fstests. That would be where this stuff should be unit tested, IMO. -- Chuck Lever