There are no error from vfs_read just the data corruption, we tried it with DIO and still see the problem, so it might not be the NFS client, we are looking again at the memory management of the application. Thanks, Marc. linux-nfs-owner@xxxxxxxxxxxxxxx wrote on 06/13/2016 09:01:38 AM: > From: "J. Bruce Fields" <bfields@xxxxxxxxxxxx> > To: Marc Eshel/Almaden/IBM@IBMUS > Cc: linux-nfs@xxxxxxxxxxxxxxx, Srikanth Srinivasan > <ssrikanth@xxxxxxxxxx>, Trond Myklebust > <trond.myklebust@xxxxxxxxxxxxxxx>, Venkateswara R Puvvada > <vpuvvada@xxxxxxxxxx> > Date: 06/13/2016 09:01 AM > Subject: Re: NFS fixes > Sent by: linux-nfs-owner@xxxxxxxxxxxxxxx > > On Sun, Jun 12, 2016 at 05:34:32PM -0700, Marc Eshel wrote: > > We are seeing a data corruption when putting very high load on the NFS V3 > > client reading multi gigabyte files in parallel. The check-sum on the > > files is showing the corruption, and looking at the data we see data that > > in one block that belongs in another block but it is not the full block. > > The test is done on multiple set of hardware using different type of > > server including kNFS and Ganesha servers with EXT3 or GPFS file system. > > The only common part in all test are NFSv3 client on REHL7.0, 7.1, 7.2. > > > > The question is there anything up stream that might fix data corruption by > > the NFSv3 client, oo do we know if this problem might have been reported > > by other users. > > > > The only fix that I see that might be related is attached, can this > > explain a data corruption? > > It should be pretty easy to check whether there've been any READ/WRITE > errors, and rule this out if not. > > Is the data being read completely static? (So you can rule out e.g. > some subtle violation of close-to-open.) > > Sorry, no special knowledge here. > > --b. > > > > > Thanks, Marc. > > > > > > Author: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > Date: Mon Aug 17 12:57:07 2015 -0500 > > > > NFS: nfs_set_pgio_error sometimes misses errors > > > > We should ensure that we always set the pgio_header's error field > > if a READ or WRITE RPC call returns an error. The current code depends > > on 'hdr->good_bytes' always being initialised to a large value, which > > is not always done correctly by callers. > > When this happens, applications may end up missing important errors. > > > > Cc: stable@xxxxxxxxxxxxxxx > > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > > > diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c > > index 4984bbe..7c5718b 100644 > > --- a/fs/nfs/pagelist.c > > +++ b/fs/nfs/pagelist.c > > @@ -77,8 +77,8 @@ EXPORT_SYMBOL_GPL(nfs_pgheader_init); > > void nfs_set_pgio_error(struct nfs_pgio_header *hdr, int error, loff_t > > pos) > > { > > spin_lock(&hdr->lock); > > - if (pos < hdr->io_start + hdr->good_bytes) { > > - set_bit(NFS_IOHDR_ERROR, &hdr->flags); > > + if (!test_and_set_bit(NFS_IOHDR_ERROR, &hdr->flags) > > + || pos < hdr->io_start + hdr->good_bytes) { > > clear_bit(NFS_IOHDR_EOF, &hdr->flags); > > hdr->good_bytes = pos - hdr->io_start; > > > > \ > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html