Due to the constraint that the NFS readdir page cache must contain every entry in cookie order from zero up to the entry of interest, the time or operations required to complete a directory listing increase exponentially with the size of the directory if the client is unable to keep the pagecache stable. The pagecache can be invalidated by a changing directory, or by memory pressure on the client. This can cause some trouble for the NFS client reading large directories over slow connections. We have a hueristic that allows eventual completion, but it only works as long as there are no other readers simultaneously filling the pagecache. I think we can resolve this problem by implementing per-page validation. By storing the directory's change version on the page, and checking for changes to the directory on every READDIR, we can validate pages against each reader's version of entry aligment. Rather than attempting to assemble the entire directory in a consistent manner in the pagecache, we can just retrieve the section we're interested in emitting. This set is a first pass at implementing this idea. Please help me pound it into acceptable shape or point out problems! Thanks for any feedback. Here's a small program that does a great job of demonstraing the client's current readdir pagecache performance problem by dropping the directory's pagecache at an interval while trying to emit every entry: #define _GNU_SOURCE #include <stdio.h> #include <unistd.h> #include <fcntl.h> #include <sched.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/syscall.h> #include <signal.h> #define BUF_SIZE 1024 #define EVICT_INTERVAL 5 int evict_pagecache(int fd) { return posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED); } int main(int argc, char **argv) { int dir_fd; pid_t pid; cpu_set_t *cpusetp = CPU_ALLOC(2); off_t off; char buf[BUF_SIZE]; if (argc < 2) { printf("%s <dir>\n", argv[0]); return 1; } dir_fd = open(argv[1], O_RDONLY|O_DIRECTORY|O_CLOEXEC); if (dir_fd < 0) { printf("cannot open dir\n"); return 1; } pid = fork(); if (pid == 0) { CPU_SET(1, cpusetp); sched_setaffinity(0, sizeof(cpu_set_t), cpusetp); do { evict_pagecache(dir_fd); off = lseek(dir_fd, 0, SEEK_CUR); printf("currently at %llu\n", off); usleep(EVICT_INTERVAL * 1000000); } while (1); } else { CPU_SET(0, cpusetp); sched_setaffinity(0, sizeof(cpu_set_t), cpusetp); while (syscall(SYS_getdents, dir_fd, buf, BUF_SIZE)) {} kill(pid, SIGINT); } close(dir_fd); return 0; } Benjamin Coddington (10): NFS: save the directory's change attribute on pagecache pages NFSv4: Send GETATTR with READDIR NFS: Add a struct to track readdir pagecache location NFS: Keep the readdir pagecache cursor updated NFS: readdir per-page cache validation NFS: stash the readdir pagecache cursor on the open directory context NFS: Support headless readdir pagecache pages NFS: Reset pagecache cursor on llseek NFS: Remove nfs_readdir_dont_search_cache() NFS: Revalidate the directory pagecache on every nfs_readdir() fs/nfs/dir.c | 210 +++++++++++++++++++++++++++----------- fs/nfs/nfs42proc.c | 2 +- fs/nfs/nfs4proc.c | 27 +++-- fs/nfs/nfs4xdr.c | 6 ++ include/linux/nfs_fs.h | 8 +- include/linux/nfs_fs_sb.h | 5 + include/linux/nfs_xdr.h | 2 + 7 files changed, 188 insertions(+), 72 deletions(-) -- 2.25.4