Re: [PATCH v7 05/21] NFS: Store the change attribute in the directory page cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2022-02-25 at 15:41 -0500, Trond Myklebust wrote:
> On Fri, 2022-02-25 at 15:23 -0500, Benjamin Coddington wrote:
> > On 25 Feb 2022, at 10:34, Benjamin Coddington wrote:
> > > Ok, so I'm reading that further proof is required, and I'm happy
> > > to
> > > do 
> > > the
> > > work.  Thanks for the replies here and elsewhere.
> > 
> > Here's an example of this problem on a tmpfs export using v8 of
> > your
> > patchset with the fix to set the change_attr in
> > nfs_readdir_page_init_array().
> > 
> > I'm using tmpfs, because it reliably orders cookies in reverse
> > order
> > of
> > creation (or perhaps sorted by name).
> > 
> > The program drives both the client-side and server-side - so on
> > this
> > one
> > system, /exports/tmpfs is:
> > tmpfs /exports/tmpfs tmpfs rw,seclabel,relatime,size=102400k 0 0
> > 
> > and /mnt/localhost is:
> > localhost:/exports/tmpfs /mnt/localhost/tmpfs nfs4 
> > rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,pr
> > ot
> > o=tcp,timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,local_lock=n
> > on
> > e,addr=127.0.0.1 
> > 0 0
> > 
> > The program creates 256 files on the server, walks through them
> > once
> > on 
> > the
> > client, deletes the last 127 on the server, drops the first page
> > from
> > the
> > pagecache, and walks through them again on the client.
> > 
> > The second listing produces 124 duplicate entries.
> > 
> > I just have to say again: this behavior is _new_ (but not new to
> > me),
> > and it
> > is absolutely going to crop up on our customer's systems that are 
> > walking
> > through millions of directory entries on loaded servers under
> > memory
> > pressure.  The directory listings as a whole become very likely to
> > be
> > nonsense at random times.  I realize they are not /supposed/ to be 
> > coherent,
> > but what we're getting here is going to be far far less coherent,
> > and
> > its
> > going to be a mess.
> > 
> > There are other scenarios that are worse when the cookies aren't 
> > ordered,
> > you can end up with EOF, or get into repeating patterns.
> > 
> > Please compare this with v3, and before this patchset, and tell me
> > if
> > I'm
> > not justified playing chicken little.
> > 
> > Here's what I do to run this:
> > 
> > mount -t tmpfs -osize=100M tmpfs /exports/tmpfs/
> > exportfs -ofsid=0 *:/exports
> > exportfs -ofsid=1 *:/exports/tmpfs
> > mount -t nfs -ov4.1,sec=sys localhost:/exports /mnt/localhost
> > ./getdents2
> > 
> > Compare "Listing 1" with "Listing 2".
> > 
> > I would also do a "rm -f /export/tmpfs/*" between each run.
> > 
> > Thanks again for your time and work.
> > 
> > Ben
> > 
> > #define _GNU_SOURCE
> > #include <stdio.h>
> > #include <unistd.h>
> > #include <fcntl.h>
> > #include <sched.h>
> > #include <sys/types.h>
> > #include <sys/stat.h>
> > #include <sys/syscall.h>
> > #include <string.h>
> > 
> > #define NFSDIR "/mnt/localhost/tmpfs"
> > #define LOCDIR "/exports/tmpfs"
> > #define BUF_SIZE 4096
> > 
> > int main(int argc, char **argv)
> > {
> >         int i, dir_fd, bpos, total = 0;
> >      size_t nread;
> >         struct linux_dirent {
> >                 long           d_ino;
> >                 off_t          d_off;
> >                 unsigned short d_reclen;
> >                 char           d_name[];
> >         };
> >      struct linux_dirent *d;
> >         char buf[BUF_SIZE];
> > 
> >      /* create files: */
> >      for (i = 0; i < 256; i++) {
> >          sprintf(buf, LOCDIR "/file_%03d", i);
> >          close(open(buf, O_CREAT, 666));
> >      }
> > 
> >         dir_fd = open(NFSDIR,
> > O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC);
> >         if (dir_fd < 0) {
> >                 perror("cannot open dir");
> >                 return 1;
> >         }
> > 
> >         while (1) {
> >                 nread = syscall(SYS_getdents, dir_fd, buf,
> > BUF_SIZE);
> >                 if (nread == 0 || nread == -1)
> >                         break;
> >                 for (bpos = 0; bpos < nread;) {
> >              d = (struct linux_dirent *) (buf + bpos);
> >              printf("%s\n", d->d_name);
> >              total++;
> >              bpos += d->d_reclen;
> >          }
> >      }
> >      printf("Listing 1: %d total dirents\n", total);
> > 
> >      /* rewind */
> >      lseek(dir_fd, 0, SEEK_SET);
> > 
> >      /* drop the first page */
> >      posix_fadvise(dir_fd, 0, 4096, POSIX_FADV_DONTNEED);
> > 
> >      /* delete the last 127 files: */
> >      for (i = 127; i < 256; i++) {
> >          sprintf(buf, LOCDIR "/file_%03d", i);
> >          unlink(buf);
> >      }
> > 
> >      total = 0;
> >         while (1) {
> >                 nread = syscall(SYS_getdents, dir_fd, buf,
> > BUF_SIZE);
> >                 if (nread == 0 || nread == -1)
> >                         break;
> >                 for (bpos = 0; bpos < nread;) {
> >              d = (struct linux_dirent *) (buf + bpos);
> >              printf("%s\n", d->d_name);
> >              total++;
> >              bpos += d->d_reclen;
> >          }
> >      }
> >      printf("Listing 2: %d total dirents\n", total);
> > 
> >         close(dir_fd);
> >         return 0;
> > }
> 
> 
> tmpfs is broken on the server. It doesn't provide stable cookies, and
> knfsd doesn't use the verifier to tell you that the cookie assignment
> has changed.
> 
> 
> Re-export of tmpfs has never worked reliably.

What I mean is that tmpfs is always a poor choice for NFS because
seekdir()/telldir() don't work reliably, and so READDIR cannot work
reliably, since it relies on open()+seekdir() to continue reading the
directory in successive RPC calls.

Anyhow, to get back to your question about whether we should or should
not be detecting that the directory changed when you delete the files
on the server. The answer is no... Nothing in the above guarantees that
the cache is revalidated.

NFS close to open cache consistency means that we guarantee to
revalidate the cached data on open(), and only then. That guarantee
does not extend to lseek() or to the rewinddir/seekdir wrappers.

If your application wants stronger cache consistency, then there are
tricks to enable that. Now that statx() has the AT_STATX_FORCE_SYNC
flag, you could use that to force a revalidation of the directory
attributes on the client. You might also use the posix_fadvise() trick
to try to clear the cache. However note that none of these tricks are
guaranteed to work. They're not reliable now, and that situation is
unlikely to change in the future barring a deliberate (and documented!)
change in kernel policy.
So as of now, the only way to reliably introduce a revalidation point
in your testcase above is to close() and then open().

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux