On 31 Jan 2023, at 17:02, Benjamin Coddington wrote: > On 31 Jan 2023, at 16:15, Chuck Lever III wrote: > >> Hi- >> >> I upgraded my test client's kernel to v6.2-rc5 and now I get >> failures during the git regression suite on all NFS versions. >> I bisected to: >> >> 85aa8ddc3818 ("NFS: Trigger the "ls -l" readdir heuristic sooner") >> >> The failure looks like: >> >> not ok 6 - git am --skip succeeds despite D/F conflict >> # >> # test_when_finished "git -C df_plus_edit_edit clean -f" && >> # test_when_finished "git -C df_plus_edit_edit reset --hard" && >> # ( >> # cd df_plus_edit_edit && >> # >> # git checkout f-edit^0 && >> # git format-patch -1 d-edit && >> # test_must_fail git am -3 0001*.patch && >> # >> # git am --skip && >> # test_path_is_missing .git/rebase-apply && >> # git ls-files -u >conflicts && >> # test_must_be_empty conflicts >> # ) >> # >> # failed 1 among 6 test(s) >> 1..6 >> make[2]: *** [Makefile:60: t1015-read-index-unmerged.sh] Error 1 >> make[2]: *** Waiting for unfinished jobs.... >> >> The regression suite is run like this: >> >> RESULTS= some random directory under /tmp >> RELEASE="git-2.37.1" >> >> rm -f ${RELEASE}.tar.gz >> curl --no-progress-meter -O https://mirrors.edge.kernel.org/pub/software/scm/git/${RELEASE}.tar.gz >> /usr/bin/time tar zxf ${RELEASE}.tar.gz >> ${RESULTS}/git 2>&1 >> >> cd ${RELEASE} >> make clean >> ${RESULTS}/git 2>&1 >> /usr/bin/time make -j${THREADS} all doc >> ${RESULTS}/git 2>&1 >> >> /usr/bin/time make -j${THREADS} test >> ${RESULTS}/git 2>&1 >> >> On this client, THREADS=12. A single-thread run doesn't seem to >> trigger a problem. So unfortunately the specific data I have is >> going to be noisy. > > I'll attempt to reproduce this and see what's up. This is an export of > tmpfs? If so, I suspect you might be running into tmpfs' unstable cookie > problem when two processes race through nfs_do_filldir().. and if so, the > cached listing of the directory on the client won't match a listing on the > server. It doesn't reproduce on ext4, but I can see it on an export of tmpfs. Unsurprisingly the pattern is getdents() returning 19 entries (17 for the first emit and "." and ".."), then unlinking those and the next getdents() returning 0. Here's a reproducer which fails on tmpfs but works properly on exports of ext4 and xfs: #define _GNU_SOURCE #include <stdio.h> #include <unistd.h> #include <fcntl.h> #include <sched.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/syscall.h> #include <string.h> #define NFSDIR "/mnt/tmpfs/dirtest" #define BUF_SIZE 4096 #define COUNT 18 int main(int argc, char **argv) { int i, dir_fd, bpos, total = 0; size_t nread; struct linux_dirent { long d_ino; off_t d_off; unsigned short d_reclen; char d_name[]; }; struct linux_dirent *d; char buf[BUF_SIZE]; /* create files */ for (i = 0; i < COUNT; i++) { sprintf(buf, NFSDIR "/file_%03d", i); close(open(buf, O_CREAT, 666)); total++; } printf("created %d total dirents\n", total); dir_fd = open(NFSDIR, O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC); if (dir_fd < 0) { perror("cannot open dir"); return 1; } /* drop the first page */ posix_fadvise(dir_fd, 0, 4096, POSIX_FADV_DONTNEED); total = 0; while (1) { nread = syscall(SYS_getdents, dir_fd, buf, BUF_SIZE); if (nread == 0 || nread == -1) break; for (bpos = 0; bpos < nread;) { d = (struct linux_dirent *) (buf + bpos); if (d->d_name[0] != '.') { printf("%s\n", d->d_name); unlinkat(dir_fd, d->d_name, 0); total++; } bpos += d->d_reclen; } } printf("found and deleted %d dirents\n", total); close(dir_fd); printf("rmdir returns %d\n", rmdir(NFSDIR)); return 0; } The client is doing uncached_readdir looking for cookie 19, but tmpfs has re-ordered the last file into cookie 3 on the second READDIR. I think this is a different case of the problems discussed about unstable readdir cookies on the last round of directory cache improvements, but since we're now returning after 17 entries the problem is exposed on a directory containing 18 files, rather than 128. Working on a fix.. Ben