Re: git regression failures with v6.2-rc NFS client

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 31 Jan 2023, at 17:02, Benjamin Coddington wrote:

> On 31 Jan 2023, at 16:15, Chuck Lever III wrote:
>
>> Hi-
>>
>> I upgraded my test client's kernel to v6.2-rc5 and now I get
>> failures during the git regression suite on all NFS versions.
>> I bisected to:
>>
>> 85aa8ddc3818 ("NFS: Trigger the "ls -l" readdir heuristic sooner")
>>
>> The failure looks like:
>>
>> not ok 6 - git am --skip succeeds despite D/F conflict
>> #
>> #               test_when_finished "git -C df_plus_edit_edit clean -f" &&
>> #               test_when_finished "git -C df_plus_edit_edit reset --hard" &&
>> #               (
>> #                       cd df_plus_edit_edit &&
>> #
>> #                       git checkout f-edit^0 &&
>> #                       git format-patch -1 d-edit &&
>> #                       test_must_fail git am -3 0001*.patch &&
>> #
>> #                       git am --skip &&
>> #                       test_path_is_missing .git/rebase-apply &&
>> #                       git ls-files -u >conflicts &&
>> #                       test_must_be_empty conflicts
>> #               )
>> #
>> # failed 1 among 6 test(s)
>> 1..6
>> make[2]: *** [Makefile:60: t1015-read-index-unmerged.sh] Error 1
>> make[2]: *** Waiting for unfinished jobs....
>>
>> The regression suite is run like this:
>>
>> RESULTS= some random directory under /tmp
>> RELEASE="git-2.37.1"
>>
>> rm -f ${RELEASE}.tar.gz
>> curl --no-progress-meter -O https://mirrors.edge.kernel.org/pub/software/scm/git/${RELEASE}.tar.gz
>> /usr/bin/time tar zxf ${RELEASE}.tar.gz >> ${RESULTS}/git 2>&1
>>
>> cd ${RELEASE}
>> make clean >> ${RESULTS}/git 2>&1
>> /usr/bin/time make -j${THREADS} all doc >> ${RESULTS}/git 2>&1
>>
>> /usr/bin/time make -j${THREADS} test >> ${RESULTS}/git 2>&1
>>
>> On this client, THREADS=12. A single-thread run doesn't seem to
>> trigger a problem. So unfortunately the specific data I have is
>> going to be noisy.
>
> I'll attempt to reproduce this and see what's up.  This is an export of
> tmpfs?  If so, I suspect you might be running into tmpfs' unstable cookie
> problem when two processes race through nfs_do_filldir().. and if so, the
> cached listing of the directory on the client won't match a listing on the
> server.

It doesn't reproduce on ext4, but I can see it on an export of tmpfs.

Unsurprisingly the pattern is getdents() returning 19 entries (17 for the
first emit and "." and ".."), then unlinking those and the next getdents()
returning 0.

Here's a reproducer which fails on tmpfs but works properly on exports of
ext4 and xfs:

#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <string.h>

#define NFSDIR "/mnt/tmpfs/dirtest"
#define BUF_SIZE 4096
#define COUNT 18

int main(int argc, char **argv)
{
    int i, dir_fd, bpos, total = 0;
    size_t nread;
    struct linux_dirent {
        long           d_ino;
        off_t          d_off;
        unsigned short d_reclen;
        char           d_name[];
    };
    struct linux_dirent *d;
    char buf[BUF_SIZE];

    /* create files */
    for (i = 0; i < COUNT; i++) {
        sprintf(buf, NFSDIR "/file_%03d", i);
        close(open(buf, O_CREAT, 666));
	total++;
    }
    printf("created %d total dirents\n", total);

    dir_fd = open(NFSDIR, O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC);
    if (dir_fd < 0) {
        perror("cannot open dir");
        return 1;
    }

    /* drop the first page */
    posix_fadvise(dir_fd, 0, 4096, POSIX_FADV_DONTNEED);
    total = 0;

    while (1) {
        nread = syscall(SYS_getdents, dir_fd, buf, BUF_SIZE);
        if (nread == 0 || nread == -1)
            break;
        for (bpos = 0; bpos < nread;) {
            d = (struct linux_dirent *) (buf + bpos);

	    if (d->d_name[0] != '.') {
		    printf("%s\n", d->d_name);
		    unlinkat(dir_fd, d->d_name, 0);
		    total++;
	    }
            bpos += d->d_reclen;
        }
    }
    printf("found and deleted %d dirents\n", total);
    close(dir_fd);

    printf("rmdir returns %d\n", rmdir(NFSDIR));
    return 0;
}

The client is doing uncached_readdir looking for cookie 19, but tmpfs has
re-ordered the last file into cookie 3 on the second READDIR.

I think this is a different case of the problems discussed about unstable
readdir cookies on the last round of directory cache improvements, but since
we're now returning after 17 entries the problem is exposed on a directory
containing 18 files, rather than 128.

Working on a fix..

Ben





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux