On Fri, 23 Feb 2024, Jacek Tomaka wrote: > Hello, > I ran into an issue where the NFS file ends up being corrupted on disk. We started noticing it on certain, quite old hardware after upgrading OS from Centos 6 to Rocky 9.2. We do see it on Rocky 9.3 but not on 9.1. > > After some investigation we have reasons to believe that the change was introduced by the following commit: > https://github.com/torvalds/linux/commit/6df25e58532be7a4cd6fb15bcd85805947402d91 Thanks for the report. Can you try a change to your kernel? diff --git a/fs/nfs/write.c b/fs/nfs/write.c index bb79d3a886ae..08a787147bd2 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -668,8 +668,10 @@ static int nfs_writepage_locked(struct folio *folio, int err; if (wbc->sync_mode == WB_SYNC_NONE && - NFS_SERVER(inode)->write_congested) + NFS_SERVER(inode)->write_congested) { + folio_redirty_for_writepage(wbc, folio); return AOP_WRITEPAGE_ACTIVATE; + } nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE); nfs_pageio_init_write(&pgio, inode, 0, false, though if your kernel is older than 6.3, that will be redirty_for_writepage(wbc, page); Thanks, NeilBrown > > We write a number of files on a single thread. Each file is up to 4GB. Before closing we call fdatasync. Sometimes the file ends up being corrupted. The corruptions is in a form of a number ( more than 3k pages in one case) of zero filled pages. > When this happens the file cannot be deleted from the client machine which created the file, even when the process which wrote the file completed successfully. > > The machines have about 128GB of memory, i think and probably network that leaves to be desired. > > My reproducer is currently tied up to our internal software, but i suspect setting the write_congested flag randomly should allow to reproduce the issue. > > Regards. > Jacek Tomaka >