Re: fstests failures with NFSD attribute delegation support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 13, 2025 at 9:50 AM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>
> On 1/4/25 5:00 PM, Chuck Lever wrote:
> > On 12/30/24 10:33 AM, Chuck Lever wrote:
> >> On 12/29/24 8:52 PM, Trond Myklebust wrote:
> >>> On Sun, 2024-12-29 at 17:37 -0500, Chuck Lever wrote:
> >>>> On 12/19/24 3:15 PM, Chuck Lever wrote:
> >>>>> On 12/18/24 4:02 PM, Chuck Lever wrote:
> >>>>>> Hi -
> >>>>>>
> >>>>>> I'm testing the NFSD support for attribute delegation, and seeing
> >>>>>> these
> >>>>>> two new fstests failures: generic/647 and generic/729. Both tests
> >>>>>> emit
> >>>>>> this error message:
> >>>>>>
> >>>>>>     mmap-rw-fault: pread /media/test/mmap-rw-fault.tmp (O_DIRECT):
> >>>>>> 0 !=
> >>>>>> 4096: Bad address
> >>>>>>
> >>>>>> This is 100% reproducible with the new patches applied to the
> >>>>>> server,
> >>>>>> and 100% not reproducible when they are not applied on the
> >>>>>> server.
> >>>>>>
> >>>>>> The failure is due to pread64() (on the client) returning EFAULT.
> >>>>>> On
> >>>>>> the wire, the passing test does:
> >>>>>>
> >>>>>> SETATTR (size = 0)
> >>>>>> WRITE (offset = 4096, len = 4096)
> >>>>>> READ (offset = 0, len = 8192)
> >>>>>> READ (offset = 4096, len = 4096)
> >>>>>> SETATTR (size = 0)
> >>>>>>    [ continues until test passes ]
> >>>>>>
> >>>>>> The failing test does:
> >>>>>>
> >>>>>> SETATTR (size = 0)
> >>>>>> WRITE (offset = 4096, len = 4096)
> >>>>>>    [ the failed pread64 seems to occur here ]
> >>>>>> CLOSE
> >>>>>>
> >>>>>> In other words, in the failing case, the client does not emit
> >>>>>> READs
> >>>>>> to pull in the changed file content.
> >>>>>>
> >>>>>> The test is using O_DIRECT so I function-traced
> >>>>>> nfs_direct_read_schedule_iovec(). In the passing case, this
> >>>>>> function
> >>>>>> generates the usual set of NFS READs on the wire and returns
> >>>>>> successfully.
> >>>>>>
> >>>>>> In the failing case, iov_iter_get_pages_alloc2() invokes
> >>>>>> get_user_pages_fast(), and that appears to fail immediately:
> >>>>>>
> >>>>>>      mmap-rw-fault-623256 [016] 175303.310394:
> >>>>>> funcgraph_entry:
> >>>>>>>         get_user_pages_fast() {
> >>>>>>      mmap-rw-fault-623256 [016] 175303.310395:
> >>>>>> funcgraph_entry:
> >>>>>>>           gup_fast_fallback() {
> >>>>>>      mmap-rw-fault-623256 [016] 175303.310395: funcgraph_entry:
> >>>>>> 0.262
> >>>>>> us   |          __pte_offset_map();
> >>>>>>      mmap-rw-fault-623256 [016] 175303.310395: funcgraph_entry:
> >>>>>> 0.142
> >>>>>> us   |          __rcu_read_unlock();
> >>>>>>      mmap-rw-fault-623256 [016] 175303.310396: funcgraph_entry:
> >>>>>> 7.824
> >>>>>> us   |          __gup_longterm_locked();
> >>>>>>      mmap-rw-fault-623256 [016] 175303.310404: funcgraph_exit:
> >>>>>> 8.967 us
> >>>>>>>          }
> >>>>>>      mmap-rw-fault-623256 [016] 175303.310404: funcgraph_exit:
> >>>>>> 9.224 us
> >>>>>>>        }
> >>>>>>      mmap-rw-fault-623256 [016] 175303.310404:
> >>>>>> funcgraph_entry:
> >>>>>>>         kvfree() {
> >>>>>>
> >>>>>> My guess is the cached inode file size is still zero.
> >>>>>
> >>>>> Confirmed: in the failing case, the read fails because the cached
> >>>>> file size is still zero. In the passing case, the cached file size
> >>>>> is
> >>>>> 8192 before the read.
> >>>>>
> >>>>> During the test, the client truncates the file, then performs an
> >>>>> NFS
> >>>>> WRITE to the server, extending the size of the file. When an
> >>>>> attribute
> >>>>> delegation is in effect, that size extension isn't reflected in the
> >>>>> cached value of i_size -- the client ensures that INVALID_SIZE is
> >>>>> always clear.
> >>>>>
> >>>>> But perhaps the NFS client is relying on the client's VFS to
> >>>>> maintain
> >>>>> i_size...? The NFS client has its own direct I/O implementation, so
> >>>>> perhaps an i_size update is missing there.
> >>>>
> >>>> Because the client never retrieves the file's size from the server
> >>>> during either the passing or failing cases, this appears to be a
> >>>> client
> >>>> bug.
> >>>>
> >>>> The bug is in nfs_writeback_update_inode() -- if mtime is delegated,
> >>>> it
> >>>> skips the file extension check, and the file size cached on the
> >>>> client
> >>>> remains zero after the WRITE completes.
> >>>>
> >>>> The culprit is commit e12912d94137 ("NFSv4: Add support for delegated
> >>>> atime and mtime attributes"). If I remove the hunk that this commit
> >>>> adds to nfs_writeback_update_inode(), both generic/647 and
> >>>> generic/729
> >>>> pass.
> >>>>
> >>>>
> >>>
> >>> I'm confused... If O_DIRECT is set on open(), then the NFSv4.x (x>0)
> >>> client will set NFS4_SHARE_WANT_NO_DELEG. Furthermore, it should not
> >>> set either NFS4_SHARE_WANT_DELEG_TIMESTAMPS or
> >>> NFS4_SHARE_WANT_OPEN_XOR_DELEGATION.
> >>
> >> Examining wire captures...
> >>
> >>
> >> In the passing test (done with NFSv4.1 to the same server), there is
> >> indeed an OPEN with OPEN4_SHARE_ACCESS_WANT_NO_DELEG followed by the
> >> WRITE to offset 4096 for 4096 bytes. The client returns this OPEN state
> >> ID immediately (via CLOSE).
> >>
> >> Then an OPEN that returns both an OPEN state ID and a WRITE delegation.
> >> The client uses the delegation state ID for reading, enabling the test
> >> to pass.
> >
> > The above is not correct. Upon closer examination, the delegation state
> > ID is used for the direct WRITE in this case, even though an OPEN state
> > ID is available.
> >
> > But since nfs_have_delegated_mtime() returns false,
> > nfs_writeback_update_inode() proceeds to update the cached file size.
> >
> >
> >> There are three OPENs on the wire during the failing test.
> >>
> >> The first two set OPEN4_SHARE_ACCESS_WANT_NO_DELEG. For those, the
> >> server returns an OPEN stateid, delegation type OPEN_DELEGATE_NONE_EXT,
> >> and WND4_NOT_WANTED is set.
> >>
> >> The third OPEN appears to request any kind of open. The share_access
> >> field contains the raw value 00300003. The rightmost "3" is SHARE_BOTH.
> >> I assume the leftmost "3" means WANT_DELEG_TIMESTAMPS and OPEN_XOR;
> >> wireshark doesn't currently recognize those bits.
> >>
> >> NFSD returns an OPEN_DELEGATE_WRITE_ATTRS_DELEG in response to this
> >> request, with a delegation state ID and no OPEN state ID.
> >>
> >> The client uses this delegation state ID for subsequent write
> >> operations. The write completions fail to extend the cached file
> >> size due to the presence of the delegation.
> >
> > Here again the client presents the delegation state ID during the WRITE.
> > But since the write delegation also permits delegated time stamps,
> > nfs_writeback_update_inode() skips the file size update.
> >
> > In both cases, nfs4_select_rw_stateid() is choosing a delegation
> > state ID for a direct WRITE. In the this case, it's choosing the
> > delegation state ID because it has no OPEN state ID (due to
> > OPEN_XOR_DELEG being set).
> >
> > nfs4_map_atomic_open_share() seems to be selecting the wrong
> > bits to enable for this test... ?
> >
>
> The test application opens the target file without O_DIRECT, performs
> one or two chores, then closes the file. It then opens the same file
> with O_DIRECT.
>
> The problem arises because, on close(2), the client caches the open
> state acquired by the first (non-O_DIRECT) open(2). It emits neither a
> CLOSE nor a DELEGRETURN.
>
> For the subsequent O_DIRECT open(2), a fresh OPEN is not emitted. The
> client uses the cached open state for direct WRITEs -- in fact, it uses
> the delegation state ID from that cached open state.

The use of a delegation stateid doesn't seem to be a problem for when
this test is run on 4.1.

I think you had to correct before when the problem was with how the
client deals with updating the attributes (now that it got a delegated
attribute ability).

The problematic chunk in the mentioned patch is the following:

@@ -1514,6 +1516,13 @@ void nfs_writeback_update_inode(struct
nfs_pgio_header *hdr)
        struct nfs_fattr *fattr = &hdr->fattr;
        struct inode *inode = hdr->inode;

+       if (nfs_have_delegated_mtime(inode)) {
+               spin_lock(&inode->i_lock);
+               nfs_set_cache_invalid(inode, NFS_INO_INVALID_BLOCKS);
+               spin_unlock(&inode->i_lock);
+               return;
+       }
+
        spin_lock(&inode->i_lock);
        nfs_writeback_check_extend(hdr, fattr);
        nfs_post_op_update_inode_force_wcc_locked(inode, fattr);


I'm not sure if reverting this chuck is the correct solution but it
does fix the problem.


>
> Note that this happens independent of the minor version. For attribute
> delegation, the client's write completion logic sees the active
> delegation and skips updating the local file size.
>
> Seems like nfs4_opendata_find_nfs4_state() needs to recognize that an
> O_DIRECT OPEN cannot use cached open state that was created by a non
> O_DIRECT_OPEN ?? Or nfs4_set_rw_stateid() needs to recognize that an
> I/O is direct, and choose only an open state ID. But that means there
> needs to be a guarantee that an open state ID is available.
>
> Guidance would be appreciated.
>
>
> --
> Chuck Lever
>





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux