Re: NFS data corruption on congested network

Jacek Tomaka <Jacek.Tomaka@xxxxxxxxx> · Mon, 26 Feb 2024 12:58:16 +0100

Hi NeilBrown, 

> though if your kernel is older than 6.3, that will be
>          redirty_for_writepage(wbc, page);

Things are looking good. I have ran it on 15 machines for good couple of hours and i do not see the problem. Usually i would see it after 1-3 iterations but now they are reaching 20 iterations without the problem.

Thank you for the fix.
Regards.
Jacek Tomaka

Temat: Re: NFS data corruption on congested network
Data: 2024-02-26 0:19
Nadawca: "NeilBrown" &lt;neilb@xxxxxxx>
Adresat: "Jacek Tomaka" &lt;Jacek.Tomaka@xxxxxxxxx>; 
DW: trond.myklebust@xxxxxxxxxxxxxxx; anna.schumaker@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; 

> 
>> On Mon, 26 Feb 2024, NeilBrown wrote:
>> On Fri, 23 Feb 2024, Jacek Tomaka wrote:
>>> Hello,
>>> I ran into an issue where the NFS file ends up being corrupted on
disk. We started noticing it on certain, quite old hardware after upgrading
OS from Centos 6 to Rocky 9.2. We do see it on Rocky 9.3 but not on 9.1.
>>> 
>>> After some investigation we have reasons to believe that the
change was introduced by the following commit: 
>>>
https://github.com/torvalds/linux/commit/6df25e58532be7a4cd6fb15bcd85805947402d91
>> 
>> Thanks for the report.
>> Can you try a change to your kernel?
>> 
>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>> index bb79d3a886ae..08a787147bd2 100644
>> --- a/fs/nfs/write.c
>> +++ b/fs/nfs/write.c
>> @@ -668,8 +668,10 @@ static int nfs_writepage_locked(struct folio
*folio,
>>  	int err;
>>  
>>  	if (wbc->sync_mode == WB_SYNC_NONE &amp;&amp;
>> -	    NFS_SERVER(inode)->write_congested)
>> +	    NFS_SERVER(inode)->write_congested) {
>> +		folio_redirty_for_writepage(wbc, folio);
>>  		return AOP_WRITEPAGE_ACTIVATE;
>> +	}
>>  
>>  	nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE);
>>  	nfs_pageio_init_write(&amp;pgio, inode, 0, false,
> 
> Actually this is only needed before linux 6.8 as only nfs_writepage()
> can call nfs_writepage_locked() with sync_mode of WB_SYNC_NONE.
> So v5.18 through v6.7 might need fixing.
> 
> NeilBrown
> 
> 
>> 
>> 
>> though if your kernel is older than 6.3, that will be
>>          redirty_for_writepage(wbc, page);
>> 
>> Thanks,
>> NeilBrown
>> 
>> 
>>> 
>>> We write a number of files on a single thread. Each file is up to
4GB. Before closing we call fdatasync. Sometimes the file ends up being
corrupted. The corruptions is in a form of a number ( more than 3k pages in
one case) of zero filled pages.
>>> When this happens the file cannot be deleted from the client
machine which created the file, even when the process which wrote the file
completed successfully.
>>> 
>>> The machines have about 128GB of memory, i think and probably
network that leaves to be desired.
>>> 
>>> My reproducer is currently tied up to our internal software, but i
suspect setting the write_congested flag randomly should allow to reproduce
the issue.
>>> 
>>> Regards.
>>> Jacek Tomaka
>>> 
>> 
>> 
>> 
> 
> 
>