On Tue, Dec 10, 2024 at 07:11:43AM -0500, Benjamin Coddington wrote: > On 9 Dec 2024, at 12:39, Nikhil Jha wrote: > > > Hello! This is the first kernel patch I have tried to upstream. I'm > > following along with the kernel newbies guide but apologies if I got > > anything wrong. > > > > Currently, if there is a mismatch in the request and response fileids in > > an NFS request, the kernel logs an error and attempts to return ESTALE. > > However, this error is currently dropped before it makes it all the way > > to userspace. This appears to be a mistake, since as far as I can tell > > that ESTALE value is never consumed from anywhere. > > > > Callstack for async NFS write, at time of error: > > > > nfs_update_inode <- returns -ESTALE > > nfs_refresh_inode_locked > > nfs_writeback_update_inode <- error is dropped here > > nfs3_write_done > > nfs_writeback_done > > nfs_pgio_result <- other errors are collected here > > rpc_exit_task > > __rpc_execute > > rpc_async_schedule > > process_one_work > > worker_thread > > kthread > > ret_from_fork > > > > We ran into this issue ourselves, and seeing the -ESTALE in the kernel > > source code but not from userspace was surprising. > > > > I tested a rebased version of this patch on an el8 kernel (v6.1.114), > > and it seems to correctly propagate this error. > > > >> 8------------------------------------------------------8< > > > > If an NFS server returns a response with a different file id to the > > response, the kernel currently prints out an error and attempts to > > return -ESTALE. However, this -ESTALE value is never surfaced anywhere. > > Hi Nikhil Jha, > > Will this cause us to return -ESTALE to the application even if the WRITE > was successful? > > Ben > Hi Ben, Hmm.. I'm not sure how to answer that question exactly. A fileid is only mismatched between a request RPC and response RPC when something really weird is going on (i.e. a bug in the NFS server), so it's hard to reason about if a WRITE was successful or not. The `return -ESTALE` was already in the kernel code, but this particular codepath seems to have accidentally dropped that error. Nikhil