[Centos] possible data corruption when NFS is used

amilivojevic at pbl.ca (Aleksandar Milivojevic) · Fri Mar 18 17:05:43 2005

Couple of days ago, I experienced nasty problem when doing mmap of file 
located on NFS mounted partition (from Solaris 9 server).  The problem 
manifests itself as data corruption.  I've notified folks at Red Hat on 
their bugzilla (after all, the kernel is build from their source), and I 
tought of sharing it with folks here.

The problem manifests itself like this.  Create empty file using 
open64() system call.  write a single byte at some position in the file 
(basically this will allocate a single block at the end of the file, 
with rest of the file empty).  In my test I wrote a single byte at 100KB 
offset using pwrite64() system call.  Use mmap() call to map entire file 
into the memory.  Use memset() library function to fill entire mmaped 
region with some pattern.  Do unmmap() on the file, and close() the file.

What happens is that on the Linux NFS client, if you do "less filename", 
you'll see the file correctly filled with the pattern.  On Solaris 9 NFS 
server, doing "less filename" will show that file is empty.  NFS allows 
for 30 seconds gap before the changes are flushed from client to the 
server (in reality, most NFS clients do not wait and will attempt to 
flush the changes to the server shortly after they are made).  However, 
this never happens, the changes are never sent to the NFS server (they 
stay cached on the client side forever).  When client is rebooted, 
changes are lost.  Doing "du -sk filename" on both client and server 
produces same results, the output indicates that the file is sparse. 
This shows inconsistency on the client (less shows that file is filled 
with pattern, so it can't be sparse, du -sk shows size that indicates 
that the file is sparse).

The longes I waited for the client to send updated file blocks to the 
NFS server was something like half an hour.  So there is possibility 
that changes would get flushed eventually in several hours (or days) 
when kernel attempts to free pages used to hold cached copy (haven't 
tested that scenario).

If the file is updated using write() or pwrite64() system calls (instead 
of mmap()/memset()/munmap() combo), the file is updated on the NFS 
server almost instantly.

I am able to reproduce it "every time" on CentOS4 as NFS client, and 
Solaris 9 as NFS server.  Haven't tried out other combinations.  RHEL4 
as NFS client should have same problem, and possible other Linux 
distributions (Fedora comes to mind as most likely candidate, becasue of 
its close connection to RHEL4).  I also have a small app that 
demonstrates the problem (that should be labeled as "one of the most 
stupid uses of mmap", basically implementation of Solaris mkfile command).

If anybody experienced hard to explain data corruptions on NFS mounted 
file systems, this might be the reason behind it.

-- 
Aleksandar Milivojevic <amilivojevic@xxxxxx>    Pollard Banknote Limited
Systems Administrator                           1499 Buffalo Place
Tel: (204) 474-2323 ext 276                     Winnipeg, MB  R3T 1L7