Re: [PATCH] nfs: propagate fileid changed errors back to syscall

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 10, 2024 at 07:11:43AM -0500, Benjamin Coddington wrote:
> On 9 Dec 2024, at 12:39, Nikhil Jha wrote:
> 
> > Hello! This is the first kernel patch I have tried to upstream. I'm
> > following along with the kernel newbies guide but apologies if I got
> > anything wrong.
> >
> > Currently, if there is a mismatch in the request and response fileids in
> > an NFS request, the kernel logs an error and attempts to return ESTALE.
> > However, this error is currently dropped before it makes it all the way
> > to userspace. This appears to be a mistake, since as far as I can tell
> > that ESTALE value is never consumed from anywhere.
> >
> > Callstack for async NFS write, at time of error:
> >
> >         nfs_update_inode <- returns -ESTALE
> >         nfs_refresh_inode_locked
> >         nfs_writeback_update_inode <- error is dropped here
> >         nfs3_write_done
> >         nfs_writeback_done
> >         nfs_pgio_result <- other errors are collected here
> >         rpc_exit_task
> >         __rpc_execute
> >         rpc_async_schedule
> >         process_one_work
> >         worker_thread
> >         kthread
> >         ret_from_fork
> >
> > We ran into this issue ourselves, and seeing the -ESTALE in the kernel
> > source code but not from userspace was surprising.
> >
> > I tested a rebased version of this patch on an el8 kernel (v6.1.114),
> > and it seems to correctly propagate this error.
> >
> >> 8------------------------------------------------------8<
> >
> > If an NFS server returns a response with a different file id to the
> > response, the kernel currently prints out an error and attempts to
> > return -ESTALE. However, this -ESTALE value is never surfaced anywhere.
> 
> Hi Nikhil Jha,
> 
> Will this cause us to return -ESTALE to the application even if the WRITE
> was successful?
> 
> Ben
> 

Hi Ben,

Hmm.. I'm not sure how to answer that question exactly. A fileid is only
mismatched between a request RPC and response RPC when something really
weird is going on (i.e. a bug in the NFS server), so it's hard to reason
about if a WRITE was successful or not.

The `return -ESTALE` was already in the kernel code, but this particular
codepath seems to have accidentally dropped that error.

Nikhil





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux