Re: [PATCH v4 1/1] NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/23/20 8:25 AM, J. Bruce Fields wrote:
On Wed, Nov 11, 2020 at 03:02:19PM -0800, Dai Ngo wrote:
On 11/10/20 2:21 PM, J. Bruce Fields wrote:
On Tue, Nov 10, 2020 at 05:08:59PM -0500, Olga Kornievskaia wrote:
On Tue, Nov 10, 2020 at 4:52 PM J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
On Tue, Nov 10, 2020 at 04:07:41PM -0500, Olga Kornievskaia wrote:
On Tue, Nov 10, 2020 at 3:14 PM J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
On Mon, Nov 09, 2020 at 10:46:12PM -0800, Dai Ngo wrote:
On 11/9/20 2:26 PM, Dai Ngo wrote:
On 11/9/20 12:42 PM, J. Bruce Fields wrote:
On Mon, Nov 09, 2020 at 11:34:08AM -0800, Dai Ngo wrote:
On 11/9/20 10:30 AM, J. Bruce Fields wrote:
On Tue, Oct 20, 2020 at 11:34:35AM -0700, Dai Ngo wrote:
On 10/20/20 10:01 AM, J. Bruce Fields wrote:
On Sun, Oct 18, 2020 at 11:42:49PM -0400, Dai Ngo wrote:
NFS_FS=y as dependency of CONFIG_NFSD_V4_2_INTER_SSC still have
build errors and some configs with NFSD=m to get NFS4ERR_STALE
error when doing inter server copy.

Added ops table in nfs_common for knfsd to access NFS
client modules.
OK, looks reasonable to me, applying.  Does this resolve all the
problems you've seen, or is there any bad case left?
Thanks Bruce.

With this patch, I no longer see the NFS4ERR_STALE in any config.

The problem with NFS4ERR_STALE was because of a bug in
nfs42_ssc_open.
When CONFIG_NFSD_V4_2_INTER_SSC is not defined, nfs42_ssc_open
returns NULL which is incorrect allowing the operation to continue
until nfsd4_putfh which does not have the code to handle
nfserr_stale.

With this patch, when CONFIG_NFSD_V4_2_INTER_SSC is not defined the
new nfs42_ssc_open returns ERR_PTR(-EIO) which causes the NFS client
to switch over to the split copying (read src and write to dst).
That sounds reasonable, but I don't see any of the patches you've sent
changing that error return.  Did I overlook something, or did you mean
to append a patch to this message?
Since with the patch, I did not run into the condition where
NFS4ERR_STALE
is returned so I did not fix this return error code. Do you want me to
submit another patch to change the returned error code from
NFS4ERR_STALE
to NFS4ERR_NOTSUPP if it ever runs into that condition?
That would be great, thanks.  (I mean, it is still possible to hit that
case, right?  You just didn't test with !CONFIG_NFSD_V4_2_INTER_SSC ?)
will do. I did tested with (!CONFIG_NFSD_V4_2_INTER_SSC) but did not hit
this case.
I need to qualify this, the copy_file_range syscall did not return
ESTALE in the test.

Because with this patch, when CONFIG_NFSD_V4_2_INTER_SSC is not
defined the new nfs42_ssc_open returns ERR_PTR(-EIO), instead of NULL in
the old code, which causes the NFS client to switch over to the split
copying (read src and write to dst).
This is not the reason why the client switches to generic_copy_file_range.

Returning NULL in the old nfs42_ssc_open is not correct, it allows
the copy
operation to proceed and hits the NFS4ERR_STALE case in the COPY
operation.
I retested with (!CONFIG_NFSD_V4_2_INTER_SSC) and saw NFS4ERR_STALE
returned for the PUTFH of the SRC in the COPY compound. However on the
client nfs42_proc_copy (with commit 7e350197a1c10) replaced the ESTALE
with EOPNOTSUPP causing nfs4_copy_file_range to use generic_copy_file_range
to do the copy.

The ESTALE error is only returned by copy_file_range if the client
does not have commit 7e350197a1c10. So I think there is no need to
make any change on the source server for the NFS4ERR_STALE error.
I don't believe NFS4ERR_STALE is the correct error for the server to
return.  It's nice that the client is able to do the right thing despite
the server returning the wrong error, but we should still try to get
this right on the server.
Hi Bruce,

ERR_STALE is the appropriate error to be returned by the server that
gets a COPY compound when it doesn't support COPY. Since server can't
understand the filehandle so it can't process it so we can't get to
processing COPY opcode where the server could have returned
EOPNOTSUPP.
The case we're discussing is the case where we support COPY but not
server-to-server copy.
My point is still the same, that's an appropriate error for when
server-to-server copy is not supported.
Uh, OK, if it backs up and returns it to the PUTFH, I guess?

Was it really the intention of nfsd4_do_async_copy() that it return
STALE in the case NFS42_ssc_open() returns NULL?  That's pretty
confusing.
In this scenario, the COPY compound fails at the PUTFH op and
NFS4ERR_NOTSUPP is not a valid error code for PUTFH, NFS4ERR_STALE is.
OK, makes sense.

I've lost track of what's left to apply.

I think we're good with this issue.
There is still one inter server copy related patch waiting for
your review:

PATCH] NFSD: Fix 5 seconds delay when doing inter server copy

Thanks,
-Dai


--b.



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux