Re: [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9 Mar 2022, at 15:01, Benjamin Coddington wrote:

On 27 Feb 2022, at 18:12, trondmy@xxxxxxxxxx wrote:

From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>

Instead of using a linear index to address the pages, use the cookie of
the first entry, since that is what we use to match the page anyway.

This allows us to avoid re-reading the entire cache on a seekdir() type of operation. The latter is very common when re-exporting NFS, and is a
major performance drain.

The change does affect our duplicate cookie detection, since we can no longer rely on the page index as a linear offset for detecting whether
we looped backwards. However since we no longer do a linear search
through all the pages on each call to nfs_readdir(), this is less of a
concern than it was previously.
The other downside is that invalidate_mapping_pages() no longer can use the page index to avoid clearing pages that have been read. A subsequent
patch will restore the functionality this provides to the 'ls -l'
heuristic.

I didn't realize the approach was to also hash out the linearly-cached
entries. I thought we'd do something like flag the context for hashed page indexes after a seekdir event, and if there are collisions with the linear
entries, they'll get fixed up when found.

Doesn't that mean that with this approach seekdir() only hits the same pages
when the entry offset is page-aligned?  That's 1 in 127 odds.

It also means we're amplifying the pagecache's useage for slightly changing directories - rather than re-using the same pages we're scattering our usage
across the index.  Eh, maybe not a big deal if we just expect the page
cache's LRU to do the work.

I don't have a better idea, though.. have you tested this performance?

..

maybe.. the hash divided the u64 cookie space into 262144 buckets, each being a page the cookie could fall into. So cookies 1 - 70368744177663 map into
page 1.. bah.  That wont work.

I was worried that I was wrong about this, but this program shows the
problem by requiring a full READDIR for each entry if we walk the entries
one-by-one with lseek().  I don't understand how the re-export seekdir()
case is helped by this unless you're hitting the exact same offsets all the
time.

I think that a hash of the page index for seekdir is no better than picking
an arbitrary offset, or just using the lowest pages in the cache.

Ben

#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <string.h>

#define NFSDIR "/mnt/fedora/127_dentries"

int main(int argc, char **argv)
{
    int i, dir_fd, bpos, total = 0;
    size_t nread;
    struct linux_dirent {
            long           d_ino;
            off_t          d_off;
            unsigned short d_reclen;
            char           d_name[];
    };
    struct linux_dirent *d;
    int buf_size = sizeof(struct linux_dirent) + sizeof("file_000");
    char buf[buf_size];

    /* create files: */
    for (i = 0; i < 127; i++) {
        sprintf(buf, NFSDIR "/file_%03d", i);
        close(open(buf, O_CREAT, 666));
    }

    dir_fd = open(NFSDIR, O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC);
    if (dir_fd < 0) {
            perror("cannot open dir");
            return 1;
    }

    /* no cache pls */
    posix_fadvise(dir_fd, 0, 0, POSIX_FADV_DONTNEED);

    while (1) {
        nread = syscall(SYS_getdents, dir_fd, buf, buf_size);
        if (nread == 0 || nread == -1)
            break;
        for (bpos = 0; bpos < nread;) {
            d = (struct linux_dirent *) (buf + bpos);
            printf("%s offset %lu\n", d->d_name, d->d_off);

            lseek(dir_fd, 0, SEEK_SET);
            lseek(dir_fd, d->d_off, SEEK_SET);
            total++;
            bpos += d->d_reclen;
        }
    }
    printf("Listing 1: %d total dirents\n", total);

    close(dir_fd);
    return 0;
}




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux