On Mon, 26 Feb 2024, NeilBrown wrote: > On Fri, 23 Feb 2024, Jacek Tomaka wrote: > > Hello, > > I ran into an issue where the NFS file ends up being corrupted on disk. We started noticing it on certain, quite old hardware after upgrading OS from Centos 6 to Rocky 9.2. We do see it on Rocky 9.3 but not on 9.1. > > > > After some investigation we have reasons to believe that the change was introduced by the following commit: > > https://github.com/torvalds/linux/commit/6df25e58532be7a4cd6fb15bcd85805947402d91 > > Thanks for the report. > Can you try a change to your kernel? > > diff --git a/fs/nfs/write.c b/fs/nfs/write.c > index bb79d3a886ae..08a787147bd2 100644 > --- a/fs/nfs/write.c > +++ b/fs/nfs/write.c > @@ -668,8 +668,10 @@ static int nfs_writepage_locked(struct folio *folio, > int err; > > if (wbc->sync_mode == WB_SYNC_NONE && > - NFS_SERVER(inode)->write_congested) > + NFS_SERVER(inode)->write_congested) { > + folio_redirty_for_writepage(wbc, folio); > return AOP_WRITEPAGE_ACTIVATE; > + } > > nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE); > nfs_pageio_init_write(&pgio, inode, 0, false, Actually this is only needed before linux 6.8 as only nfs_writepage() can call nfs_writepage_locked() with sync_mode of WB_SYNC_NONE. So v5.18 through v6.7 might need fixing. NeilBrown > > > though if your kernel is older than 6.3, that will be > redirty_for_writepage(wbc, page); > > Thanks, > NeilBrown > > > > > > We write a number of files on a single thread. Each file is up to 4GB. Before closing we call fdatasync. Sometimes the file ends up being corrupted. The corruptions is in a form of a number ( more than 3k pages in one case) of zero filled pages. > > When this happens the file cannot be deleted from the client machine which created the file, even when the process which wrote the file completed successfully. > > > > The machines have about 128GB of memory, i think and probably network that leaves to be desired. > > > > My reproducer is currently tied up to our internal software, but i suspect setting the write_congested flag randomly should allow to reproduce the issue. > > > > Regards. > > Jacek Tomaka > > > > >