Re: fstests failures with NFSD attribute delegation support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/30/24 10:33 AM, Chuck Lever wrote:
On 12/29/24 8:52 PM, Trond Myklebust wrote:
On Sun, 2024-12-29 at 17:37 -0500, Chuck Lever wrote:
On 12/19/24 3:15 PM, Chuck Lever wrote:
On 12/18/24 4:02 PM, Chuck Lever wrote:
Hi -

I'm testing the NFSD support for attribute delegation, and seeing
these
two new fstests failures: generic/647 and generic/729. Both tests
emit
this error message:

    mmap-rw-fault: pread /media/test/mmap-rw-fault.tmp (O_DIRECT):
0 !=
4096: Bad address

This is 100% reproducible with the new patches applied to the
server,
and 100% not reproducible when they are not applied on the
server.

The failure is due to pread64() (on the client) returning EFAULT.
On
the wire, the passing test does:

SETATTR (size = 0)
WRITE (offset = 4096, len = 4096)
READ (offset = 0, len = 8192)
READ (offset = 4096, len = 4096)
SETATTR (size = 0)
   [ continues until test passes ]

The failing test does:

SETATTR (size = 0)
WRITE (offset = 4096, len = 4096)
   [ the failed pread64 seems to occur here ]
CLOSE

In other words, in the failing case, the client does not emit
READs
to pull in the changed file content.

The test is using O_DIRECT so I function-traced
nfs_direct_read_schedule_iovec(). In the passing case, this
function
generates the usual set of NFS READs on the wire and returns
successfully.

In the failing case, iov_iter_get_pages_alloc2() invokes
get_user_pages_fast(), and that appears to fail immediately:

     mmap-rw-fault-623256 [016] 175303.310394:
funcgraph_entry:
        get_user_pages_fast() {
     mmap-rw-fault-623256 [016] 175303.310395:
funcgraph_entry:
          gup_fast_fallback() {
     mmap-rw-fault-623256 [016] 175303.310395: funcgraph_entry:
0.262
us   |          __pte_offset_map();
     mmap-rw-fault-623256 [016] 175303.310395: funcgraph_entry:
0.142
us   |          __rcu_read_unlock();
     mmap-rw-fault-623256 [016] 175303.310396: funcgraph_entry:
7.824
us   |          __gup_longterm_locked();
     mmap-rw-fault-623256 [016] 175303.310404: funcgraph_exit:
8.967 us
         }
     mmap-rw-fault-623256 [016] 175303.310404: funcgraph_exit:
9.224 us
       }
     mmap-rw-fault-623256 [016] 175303.310404:
funcgraph_entry:
        kvfree() {

My guess is the cached inode file size is still zero.

Confirmed: in the failing case, the read fails because the cached
file size is still zero. In the passing case, the cached file size
is
8192 before the read.

During the test, the client truncates the file, then performs an
NFS
WRITE to the server, extending the size of the file. When an
attribute
delegation is in effect, that size extension isn't reflected in the
cached value of i_size -- the client ensures that INVALID_SIZE is
always clear.

But perhaps the NFS client is relying on the client's VFS to
maintain
i_size...? The NFS client has its own direct I/O implementation, so
perhaps an i_size update is missing there.

Because the client never retrieves the file's size from the server
during either the passing or failing cases, this appears to be a
client
bug.

The bug is in nfs_writeback_update_inode() -- if mtime is delegated,
it
skips the file extension check, and the file size cached on the
client
remains zero after the WRITE completes.

The culprit is commit e12912d94137 ("NFSv4: Add support for delegated
atime and mtime attributes"). If I remove the hunk that this commit
adds to nfs_writeback_update_inode(), both generic/647 and
generic/729
pass.



I'm confused... If O_DIRECT is set on open(), then the NFSv4.x (x>0)
client will set NFS4_SHARE_WANT_NO_DELEG. Furthermore, it should not
set either NFS4_SHARE_WANT_DELEG_TIMESTAMPS or
NFS4_SHARE_WANT_OPEN_XOR_DELEGATION.

Examining wire captures...


In the passing test (done with NFSv4.1 to the same server), there is
indeed an OPEN with OPEN4_SHARE_ACCESS_WANT_NO_DELEG followed by the
WRITE to offset 4096 for 4096 bytes. The client returns this OPEN state
ID immediately (via CLOSE).

Then an OPEN that returns both an OPEN state ID and a WRITE delegation.
The client uses the delegation state ID for reading, enabling the test
to pass.

The above is not correct. Upon closer examination, the delegation state
ID is used for the direct WRITE in this case, even though an OPEN state
ID is available.

But since nfs_have_delegated_mtime() returns false, nfs_writeback_update_inode() proceeds to update the cached file size.


There are three OPENs on the wire during the failing test.

The first two set OPEN4_SHARE_ACCESS_WANT_NO_DELEG. For those, the
server returns an OPEN stateid, delegation type OPEN_DELEGATE_NONE_EXT,
and WND4_NOT_WANTED is set.

The third OPEN appears to request any kind of open. The share_access
field contains the raw value 00300003. The rightmost "3" is SHARE_BOTH.
I assume the leftmost "3" means WANT_DELEG_TIMESTAMPS and OPEN_XOR;
wireshark doesn't currently recognize those bits.

NFSD returns an OPEN_DELEGATE_WRITE_ATTRS_DELEG in response to this
request, with a delegation state ID and no OPEN state ID.

The client uses this delegation state ID for subsequent write
operations. The write completions fail to extend the cached file
size due to the presence of the delegation.

Here again the client presents the delegation state ID during the WRITE.
But since the write delegation also permits delegated time stamps,
nfs_writeback_update_inode() skips the file size update.

In both cases, nfs4_select_rw_stateid() is choosing a delegation
state ID for a direct WRITE. In the this case, it's choosing the
delegation state ID because it has no OPEN state ID (due to
OPEN_XOR_DELEG being set).

nfs4_map_atomic_open_share() seems to be selecting the wrong
bits to enable for this test... ?

--
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux