NeilBrown [neilb@xxxxxxx] wrote: > On Fri, 29 Mar 2013 19:57:06 -0000 malahal naineni <malahal@xxxxxxxxxx> wrote: > > > This bug seems to be present in v2.6.37 or lower versions. The code was > > re-organized in v2.6.38 that eliminated the bug. Current upstream code > > doesn't have this bug. This may be applicable to some longterm releases! > > > > Here are the bug details: > > > > 1. nfs_read_rpcsetup(), args.count and res.count are both set to the > > actual number of bytes to be read. Let us assume that the request is > > for 16K, so arg.count = res.count = 16K > > 2. nfs3_xdr_readres() conditionally sets res.count to to the actual > > number of bytes read. This condition is true for the first response > > as res.count was set to args.count before the first request. Let us > > say the server returned only 4K bytes. res.count=4K > > 3. Another read request is sent for the remaining data. Note that > > res.count is NOT updated. It is still set to the actual amount of > > bytes we got in the first response. The client will send a READ > > request for the remaining 12K. > > This is looks like a real bug, but I think the "NOT" above is the best thing > to fix. No doubt, a real bug! Easily reproduced with an instrumented nfsd. > i.e. when another read request is set, res.count *SHOULD*BE* updated. That > makes it consistent with the original send, and consistency is good! I thought about it, but the resp->count is NOT really used as far as I know. And the current upstream code unconditionally sets the resp->count in xdr function. So I chose the upstream method! I agree, your patch is more consistent with the existing code. > Index: linux-2.6.32-SLE11-SP1-LTSS/fs/nfs/read.c > =================================================================== > --- linux-2.6.32-SLE11-SP1-LTSS.orig/fs/nfs/read.c 2013-03-20 16:24:31.426605189 +1100 > +++ linux-2.6.32-SLE11-SP1-LTSS/fs/nfs/read.c 2013-04-11 11:19:57.670724540 +1000 > @@ -368,6 +368,7 @@ static void nfs_readpage_retry(struct rp > argp->offset += resp->count; > argp->pgbase += resp->count; > argp->count -= resp->count; > + resp->count = argp->count; > nfs4_restart_rpc(task, NFS_SERVER(data->inode)->nfs_client); > return; > out: This patch should fix the bug as well. > This would old affect clients with servers which would sometimes > return partial reads, and I don't think the Linux NFS server does. > What server have you seen this against? We came across under Ganesha development, and I was told the same thing that Linux NFS server doesn't do this. I didn't bother to post then, but now we saw the same thing with linux NFS server, so I decided to post it now. Although linux NFS server doesn't in itself create short reads but it just calls the underlying back-end file system and sends without a short read "check" to NFS clients. In other words, the short read behaviour actually depends on the back-end file system rather than linux NFS server. We saw this bug with our GPFS file system in combination with HSM -- request for reading data on tape would fail until it is brought back to disk. Regards, Malahal. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html