Re: [Bugme-new] [Bug 11448] New: NFS client has inconsistent write flushing to non-linux serversa

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Thu, 28 Aug 2008 13:27:53 -0700

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 28 Aug 2008 11:41:08 -0700 (PDT)
bugme-daemon@xxxxxxxxxxxxxxxxxxx wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=11448
> 
>            Summary: NFS client has inconsistent write flushing to non-linux
>                     serversa
>            Product: File System
>            Version: 2.5
>      KernelVersion: 2.6.22.15
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: NFS
>         AssignedTo: trond.myklebust@xxxxxxxxxx
>         ReportedBy: doug@xxxxxxx
> 
> 
> Latest working kernel version: N/A (works on 2.6.18 with Linux NFS server, but
> we cannot continue to use that kernel for various reasons)
> Earliest failing kernel version: N/A (2.6.18, 2.6.24, and 2.6.25 are also known
> to fail by another party experiencing same bug against non-Linux NFS servers).
> Not currently known to be reproducible against NetApp, but this is not
> authoritative (lack of seeing a bug does not guarantee lack of existence)
> Distribution: CentOS 4.6
> Hardware Environment: supermicro twin, 2 quad core Harpertown CPU, 16G ram.
> Software Environment: CentOS 4.6
> Problem Description: 
> 
> NFS client writes to Sun Solaris 10 U4 server. 
> at some point in time, there is an empty portion of the output file from the
> writer containing missing data (shows as NULL bytes from another NFS client
> issuing a tail -f on the file being written). 
> confirmed that the file as exists on the NFS server is sparse, missing bytes
> (not necessarily multiple of 512 or 1024, one sample is a gap of 3818 bytes,
> another is 1895 bytes, another is 423 bytes)
> 
> if you do a read of the entire file from the NFS client doing the writing, it
> causes the non-flushed writes to be instantly flushed to the server followed by
> a NFS3 commit operation. The data then can be seen on all other NFS clients.
> 
> If you do an open of the file alone, no flush
> if you do an open and a close, no flush
> if you do an open and a read at the beginning of the file (far before the data
> that is outstanding), *usually* no flush (one case where it did).
> If you do a read at another position in the file, no flush (other than as
> indicated above).
> If you do a read at the indicated offset where the bytes are null, it causes
> the NFS client to write and NFS commit to the server (truss output available)
> 
> The missing blocks may flush themselves after undefined periods of time which
> can be hours. Our runs last days.
> 
> Steps to reproduce:
> 
> Chemist running NAMD sees frequent cases of this in his output trajectory index
> files. We don't have an exact sequence of steps to reproduce. After I file this
> ticket I will be giving ticket number to another person I know at a different
> company experiencing the same problem as described above (to the best of my
> knowledge)
> 

That seems rather ugly.

2.6.22 is getting a bit old though.  It's quite possible that this was
subsequently fixed, in which case upgrading your kernel or hassling the
vendor to backport the fix would be needed.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html