Re: A NFS client partial file corruption problem in recent/current kernels

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Sep 11, 2018, at 4:00 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> On Tue, 2018-09-11 at 14:02 -0400, Chris Siebenmann wrote:
>>>> We've found a readily reproducable situation where the current
>>>> NFS client code will provide zero bytes instead of actual data at
>>>> the end of the file (sort of) to user programs. This can result
>>>> in program failure, or permanent file corruption if the program
>>>> reading the file writes the bad data back to the file; otherwise,
>>>> the corruption goes away when the client's cached data is pushed
>>>> out
>>>> of memory (or explicitly dropped by dropping the pagecache
>>>> through
>>>> /proc/sys/vm/drop_caches).
>> 
>> [...]
>>> Please see http://nfs.sourceforge.net/#faq_a8
>> 
>> I don't think this is a close to open consistency issue, or if it is
>> I would argue that it is a clear bug on the Linux NFS client. I have
>> a number of reasons for saying this:
>> 
>> - the client clearly sees the new attributes; it knows that the file
>>  has been extended from the previous state that it knew of. My demo
>>  program specifically waits until user-level fstat() returns a
>> different
>>  result, which I believe means that the client kernel has seen a
>> different
>>  GETATTR result and so should have purged its cache (based on what
>> the
>>  FAQ says).
>> 
>>  (Unless the FAQ means that the kernel absolutely refuses to
>> guarantee
>>  anything about file consistency unless you close and then reopen
>> the
>>  file, even if it *knows* that the file has changed on the server,
>>  which isn't clear from how the FAQ is currently written.)
>> 
>> - the client is fetching some new data from the fileserver (data
>> after
>>  the partial 4 KB page at the old end of the file).
>> 
>> - the client isn't writing to the file in my demonstration program;
>> it's
>>  only opening it in read-write mode and then reading it. Also, this
>>  doesn't happen if the client does exactly the same set of
>> operations
>>  but has the file open read-only (with it staying open throughout).
>> 
>> - this didn't happen in older kernels.
>> 
>> In addition, although I didn't mention it in my original email, this
>> happens on a NFS filesystem mounted 'noac'.
>> 
>> Pragmatically, Alpine used to work with NFS mounted filesystems where
>> email was appended to them from other machines and it no longer does,
>> and the only difference is the kernel version involved on the client.
>> This breakage is actively dangerous.
> 
> Sure, but unless you are locking the file, or you are explicitly using
> O_DIRECT to do uncached I/O, then you are in violation of the close-to-
> open consistency model, and the client is going to behave as you
> describe above. NFS uses a distributed filesystem model, not a
> clustered one.

I would expect Alpine to work if "vers=3,noac" is in use.


--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux