Re: [PATCH] nfs: propagate fileid changed errors back to syscall

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2024-12-10 at 12:12 -0500, Nikhil Jha wrote:
> On Tue, Dec 10, 2024 at 07:11:43AM -0500, Benjamin Coddington wrote:
> > On 9 Dec 2024, at 12:39, Nikhil Jha wrote:
> > 
> > > Hello! This is the first kernel patch I have tried to upstream.
> > > I'm
> > > following along with the kernel newbies guide but apologies if I
> > > got
> > > anything wrong.
> > > 
> > > Currently, if there is a mismatch in the request and response
> > > fileids in
> > > an NFS request, the kernel logs an error and attempts to return
> > > ESTALE.
> > > However, this error is currently dropped before it makes it all
> > > the way
> > > to userspace. This appears to be a mistake, since as far as I can
> > > tell
> > > that ESTALE value is never consumed from anywhere.
> > > 
> > > Callstack for async NFS write, at time of error:
> > > 
> > >         nfs_update_inode <- returns -ESTALE
> > >         nfs_refresh_inode_locked
> > >         nfs_writeback_update_inode <- error is dropped here
> > >         nfs3_write_done
> > >         nfs_writeback_done
> > >         nfs_pgio_result <- other errors are collected here
> > >         rpc_exit_task
> > >         __rpc_execute
> > >         rpc_async_schedule
> > >         process_one_work
> > >         worker_thread
> > >         kthread
> > >         ret_from_fork
> > > 
> > > We ran into this issue ourselves, and seeing the -ESTALE in the
> > > kernel
> > > source code but not from userspace was surprising.
> > > 
> > > I tested a rebased version of this patch on an el8 kernel
> > > (v6.1.114),
> > > and it seems to correctly propagate this error.
> > > 
> > > > 8------------------------------------------------------8<
> > > 
> > > If an NFS server returns a response with a different file id to
> > > the
> > > response, the kernel currently prints out an error and attempts
> > > to
> > > return -ESTALE. However, this -ESTALE value is never surfaced
> > > anywhere.
> > 
> > Hi Nikhil Jha,
> > 
> > Will this cause us to return -ESTALE to the application even if the
> > WRITE
> > was successful?
> > 
> > Ben
> > 
> 
> Hi Ben,
> 
> Hmm.. I'm not sure how to answer that question exactly. A fileid is
> only
> mismatched between a request RPC and response RPC when something
> really
> weird is going on (i.e. a bug in the NFS server), so it's hard to
> reason
> about if a WRITE was successful or not.
> 
> The `return -ESTALE` was already in the kernel code, but this
> particular
> codepath seems to have accidentally dropped that error.
> 
> Nikhil
> 

I'm not keen on taking any client changes to paper over what appears to
be a major server bug. We simply can't support broken servers that
reuse filehandles across files.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux