Re: A NFS client partial file corruption problem in recent/current kernels

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2018-09-11 at 11:59 -0400, Chris Siebenmann wrote:
>  We've found a readily reproducable situation where the current
> NFS client code will provide zero bytes instead of actual data at the
> end of the file (sort of) to user programs. This can result in
> program
> failure, or permanent file corruption if the program reading the file
> writes the bad data back to the file; otherwise, the corruption goes
> away when the client's cached data is pushed out of memory (or
> explicitly
> dropped by dropping the pagecache through /proc/sys/vm/drop_caches).
> 
>  The reproduction steps are:
> 
> * on a NFS client, open the file read-write and read to the end of
> the
>   file (possibly just read the end of the file).
> * hold the file open read-write and wait for the file size to grow.
> 
>   All the bits of these first two steps appear to be required; you
> must
>   read the end of the file, you must have the file open read-write,
>   and you must hold it open read-write.
> 
> * on either another NFS client or the NFS server, append data to the
>   file.
> 
> * now that your program sees the new file size, try to read the new
>   data (from the old end of the file to the new end of the file).
>   Any data from the old end of file up to the next 4 KB boundary will
>   be zero bytes instead of its actual content; after that, it will be
>   the proper new content.
> 
> I have a demonstration reproduction program here:
> 	https://www.cs.toronto.edu/~cks/vendors/linux-nfs/
> 
> This issue isn't present in the Ubuntu 16.04 LTS server kernel
> (labeled
> as '4.4.0', plus years of Ubuntu changes) and is present in the
> Ubuntu
> 18.04 LTS kernel (labeled 4.15.0) and the Fedora 28 4.17.9 and 4.18.5
> kernels. It happens on both NFSv3 and NFSv4 mounts (both with
> 'sec=sys')
> and the NFS fileserver OS and the filesystem type (on Linux) doesn't
> appear to matter; we initially saw this against OmniOS NFS servers
> using
> ZFS and have reproduced this against Linux NFS servers on ext4,
> tmpfs,
> and ZFS (ZFS on Linux) with both Ubuntu 18.04 and Fedora 28 kernels.
> 
>  This bug causes Alpine to fail when accessing your /var/mail inbox
> over NFS (and you get new mail delivered to it). There are probably
> other
> programs affected, although hopefully not many programs hold files
> open
> read-write while other programs are appending data.
> 
>  I'd be happy to answer any further questions, but we have limited
> ability to try different kernels or kernel changes to see if they
> change
> the situation (we don't run stock kernels on any machines; they're
> all
> vendor-based ones).
> 

Please see http://nfs.sourceforge.net/#faq_a8


-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux