Re: [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index

"Benjamin Coddington" <bcodding@xxxxxxxxxx> · Wed, 09 Mar 2022 16:03:16 -0500

On 9 Mar 2022, at 15:01, Benjamin Coddington wrote:

On 27 Feb 2022, at 18:12, trondmy@xxxxxxxxxx wrote:

From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>

Instead of using a linear index to address the pages, use the cookie 
of
the first entry, since that is what we use to match the page anyway.

This allows us to avoid re-reading the entire cache on a seekdir() 
type
of operation. The latter is very common when re-exporting NFS, and is 
a
major performance drain.

The change does affect our duplicate cookie detection, since we can 
no
longer rely on the page index as a linear offset for detecting 
whether
we looped backwards. However since we no longer do a linear search
through all the pages on each call to nfs_readdir(), this is less of 
a
concern than it was previously.
The other downside is that invalidate_mapping_pages() no longer can 
use
the page index to avoid clearing pages that have been read. A 
subsequent
patch will restore the functionality this provides to the 'ls -l'
heuristic.

I didn't realize the approach was to also hash out the linearly-cached
entries.  I thought we'd do something like flag the context for hashed 
page
indexes after a seekdir event, and if there are collisions with the 
linear
entries, they'll get fixed up when found.

Doesn't that mean that with this approach seekdir() only hits the same 
pages
when the entry offset is page-aligned?  That's 1 in 127 odds.

It also means we're amplifying the pagecache's useage for slightly 
changing
directories - rather than re-using the same pages we're scattering our 
usage
across the index.  Eh, maybe not a big deal if we just expect the page
cache's LRU to do the work.

I don't have a better idea, though.. have you tested this performance?

..

maybe.. the hash divided the u64 cookie space into 262144 buckets, each 
being
a page the cookie could fall into.   So cookies 1 - 70368744177663 map 
into
page 1.. bah.  That wont work.

I was worried that I was wrong about this, but this program shows the
problem by requiring a full READDIR for each entry if we walk the 
entries
one-by-one with lseek().  I don't understand how the re-export seekdir()
case is helped by this unless you're hitting the exact same offsets all 
the
time.

I think that a hash of the page index for seekdir is no better than 
picking
an arbitrary offset, or just using the lowest pages in the cache.

Ben

#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <string.h>

#define NFSDIR "/mnt/fedora/127_dentries"

int main(int argc, char **argv)
{
    int i, dir_fd, bpos, total = 0;
    size_t nread;
    struct linux_dirent {
            long           d_ino;
            off_t          d_off;
            unsigned short d_reclen;
            char           d_name[];
    };
    struct linux_dirent *d;
    int buf_size = sizeof(struct linux_dirent) + sizeof("file_000");
    char buf[buf_size];

    /* create files: */
    for (i = 0; i < 127; i++) {
        sprintf(buf, NFSDIR "/file_%03d", i);
        close(open(buf, O_CREAT, 666));
    }

    dir_fd = open(NFSDIR, O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC);
    if (dir_fd < 0) {
            perror("cannot open dir");
            return 1;
    }

    /* no cache pls */
    posix_fadvise(dir_fd, 0, 0, POSIX_FADV_DONTNEED);

    while (1) {
        nread = syscall(SYS_getdents, dir_fd, buf, buf_size);
        if (nread == 0 || nread == -1)
            break;
        for (bpos = 0; bpos < nread;) {
            d = (struct linux_dirent *) (buf + bpos);
            printf("%s offset %lu\n", d->d_name, d->d_off);

            lseek(dir_fd, 0, SEEK_SET);
            lseek(dir_fd, d->d_off, SEEK_SET);
            total++;
            bpos += d->d_reclen;
        }
    }
    printf("Listing 1: %d total dirents\n", total);

    close(dir_fd);
    return 0;
}