On 9 Mar 2022, at 15:01, Benjamin Coddington wrote:
On 27 Feb 2022, at 18:12, trondmy@xxxxxxxxxx wrote:
From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
Instead of using a linear index to address the pages, use the cookie
of
the first entry, since that is what we use to match the page anyway.
This allows us to avoid re-reading the entire cache on a seekdir()
type
of operation. The latter is very common when re-exporting NFS, and is
a
major performance drain.
The change does affect our duplicate cookie detection, since we can
no
longer rely on the page index as a linear offset for detecting
whether
we looped backwards. However since we no longer do a linear search
through all the pages on each call to nfs_readdir(), this is less of
a
concern than it was previously.
The other downside is that invalidate_mapping_pages() no longer can
use
the page index to avoid clearing pages that have been read. A
subsequent
patch will restore the functionality this provides to the 'ls -l'
heuristic.
I didn't realize the approach was to also hash out the linearly-cached
entries. I thought we'd do something like flag the context for hashed
page
indexes after a seekdir event, and if there are collisions with the
linear
entries, they'll get fixed up when found.
Doesn't that mean that with this approach seekdir() only hits the same
pages
when the entry offset is page-aligned? That's 1 in 127 odds.
It also means we're amplifying the pagecache's useage for slightly
changing
directories - rather than re-using the same pages we're scattering our
usage
across the index. Eh, maybe not a big deal if we just expect the page
cache's LRU to do the work.
I don't have a better idea, though.. have you tested this performance?
..
maybe.. the hash divided the u64 cookie space into 262144 buckets, each
being
a page the cookie could fall into. So cookies 1 - 70368744177663 map
into
page 1.. bah. That wont work.
I was worried that I was wrong about this, but this program shows the
problem by requiring a full READDIR for each entry if we walk the
entries
one-by-one with lseek(). I don't understand how the re-export seekdir()
case is helped by this unless you're hitting the exact same offsets all
the
time.
I think that a hash of the page index for seekdir is no better than
picking
an arbitrary offset, or just using the lowest pages in the cache.
Ben
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <string.h>
#define NFSDIR "/mnt/fedora/127_dentries"
int main(int argc, char **argv)
{
int i, dir_fd, bpos, total = 0;
size_t nread;
struct linux_dirent {
long d_ino;
off_t d_off;
unsigned short d_reclen;
char d_name[];
};
struct linux_dirent *d;
int buf_size = sizeof(struct linux_dirent) + sizeof("file_000");
char buf[buf_size];
/* create files: */
for (i = 0; i < 127; i++) {
sprintf(buf, NFSDIR "/file_%03d", i);
close(open(buf, O_CREAT, 666));
}
dir_fd = open(NFSDIR, O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC);
if (dir_fd < 0) {
perror("cannot open dir");
return 1;
}
/* no cache pls */
posix_fadvise(dir_fd, 0, 0, POSIX_FADV_DONTNEED);
while (1) {
nread = syscall(SYS_getdents, dir_fd, buf, buf_size);
if (nread == 0 || nread == -1)
break;
for (bpos = 0; bpos < nread;) {
d = (struct linux_dirent *) (buf + bpos);
printf("%s offset %lu\n", d->d_name, d->d_off);
lseek(dir_fd, 0, SEEK_SET);
lseek(dir_fd, d->d_off, SEEK_SET);
total++;
bpos += d->d_reclen;
}
}
printf("Listing 1: %d total dirents\n", total);
close(dir_fd);
return 0;
}