NFS re-export, READDIR, and large cookies

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've been investigating an issue where an NFSv3 client would receive
NFS3ERR_INVAL in response to a READDIR or READDIRPLUS request when using
a cookie.

The set up is using an intermediate NFS re-export server;
* Source NFS Server: VAST Data
* Proxy NFS Server : Ubuntu 20.04 LTS, Kernel 5.13.18

Several clients were used for testing, including an older 3.10 kernel.
There was no difference between them when mounting the re-export proxy
NFS server. There are differences in behaviour when mounting the source
server directly based upon whether the client's kernel implements
nfs_readdir_use_cookie.

For the test a directory was created on the source NFS server containing
200 files.

While the investigation initially looked at the READDIR issue with a
re-export server it was discovered that the underlying issue can also
cause odd behaviour when the clients mount the source NFS server
directly without the re-export proxy in the middle. The issue can affect
user applications that use telldir, seekdir, lseek, or similar
functions.

When the client running 3.10 accessed the NFS share via the proxy NFS
server the following exchange was observed when the ls command was
executed:

1) Client -> Proxy    : READDIRPLUS (cookie: 0)
2)   Proxy  -> Source : READDIRPLUS (cookie: 0)
3)   Source -> Proxy  : Reply, first 100 files, EOF 0
4)   Proxy  -> Source : READDIRPLUS (cookie: 2551291679986417766)
5)   Source -> Proxy  : Reply, next 200 files, EOF 1
6) Proxy  -> Client   : Reply, all 200 files, EOF 1
7) Client -> Proxy    : READDIRPLUS (cookie: 11500424819426459749)
8) Proxy  -> Client   : NFS3ERR_INVAL

I'm not certain why the client issued a second READDIRPLUS with a cookie
since the first request contains the full directory listing as indicated
by the EOF field.

The cookie in the second request is a valid cookie that was issued by
the source NFS server. The cookie is for a file about half way through
the directory listing. While the cookie is valid for the NFS 3 protocol,
it should be noted that the cookie's value is greater than 2^63-1. When
interpreted as a signed 64 bit integer the cookie would have the value
of -6946319254283091867.

Sample of directory entries captured by tcpdump (only includes the name
and cookie fields for brevity):

    Entry: name .      Cookie: 1
    Entry: name ..     Cookie: 2
    Entry: name 1      Cookie: 848716379849752578
    Entry: name 10     Cookie: 15827834395709931523
    Entry: name 100    Cookie: 16032066584625283076
    Entry: name 101    Cookie: 3137853460930625541
    Entry: name 102    Cookie: 7540226876707438598
    Entry: name 103    Cookie: 4424272775414284295
    Entry: name 104    Cookie: 15750249638323552264
    Entry: name 105    Cookie: 15370663860381941769
    ...

Tracing how the NFS cookie is handled by nfsd to the point the error is
generated I found the following:

* The cookie is converted to loff_t. This converts from unsigned to
  signed.

  nfsd/nfs3proc.c - nfsd3_proc_readdirplus
      loff_t offset;
      offset = argp->cookie;

* This offset is then passed to, nfsd_readdir where it is used with
  vfs_llseek:

  nfsd/vfs.c - nfsd_readdir
      offset = vfs_llseek(file, offset, SEEK_SET);

* Since the proxy server is re-exporting an NFS volume my assumption is
  that the underlying VFS driver is NFS and the file handle is a
  directory, thus vfs_llseek invokes nfs_llseek_dir.

  nfs/dir.c - nfs_llseek_dir
      switch (whence) {
      case SEEK_SET:
          if (offset < 0)
              return -EINVAL;

* Since offset is < 0, -EINVAL is returned resulting in NFS3ERR_INVAL.

Reading further into the nfs/dir.c source, it seems the cookie value is
used extensively as the dir's offset position, often being stored in
ctx->pos.

The issue here is the dir_context pos field is exposed to user
applications. As a test the proxy NFS was removed, and the clients
accessed the source NFS directly. In this configuration READDIRPLUS
worked as expected but issues with telldir and seekdir were observed.

When printing a directory listing using opendir/readdir negative d_off
values were displayed in the output (left file name, right d_off):

      . - 1
     .. - 2
      1 - 848716379849752578
     10 - -2618909677999620093
    100 - -2414677489084268540
    101 - 3137853460930625541
    102 - 7540226876707438598
    103 - 4424272775414284295
    104 - -2696494435385999352
    105 - -3076080213327609847
    ...

The directory listing was printed a second time, with an added call to
seekdir after opendir. If a non-negative d_off value was chosen the
directory listing would correctly start from that entry. If a negative
d_off value was chosen the directory listing would start from the first
entry.

As seekdir has no way to indicate an error, it's likely that the lseek
call failed. We did not include a test at the time to clear and check
errno but it's likely this would have indicated EINVAL.

A similar issue was noted with lseek returning negative positions for
directories on ext4: https://bugzilla.kernel.org/show_bug.cgi?id=200043
It was noted here that the correct behaviour is not well defined.
It seems it's not prohibited to return a negative value, but many
applications tend to interpret negative values as an error. Also lseek
is now documented to return -1 on an error, which is an issue here as -1
is a perfectly valid cookie value.

On the older 3.10 kernel, this was not an issue as the 3.10 kernel uses
the array index position for the offset value instead of the NFS cookie.

--
Chris



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux