[NOTE: cross-posted to linux-nfs and linux-fsdevel] I'm seeing some unexpected behavior with NFS and file sizes. The test cases are from the LTP (Linux Test Project), tests ftest01, ftest05, and ftest07. I'll concentrate on ftest01 to explain what I've found. ftest01 fires off 5 subprocesses, each of which opens an empty file and does the following, repeatedly: . lseek to some point in the file . read 2048 bytes . lseek back to the same point . write 2048 bytes The "point in the file" is determined by a pseudo-random sequence. All such points are on 2048-byte boundaries. Occasionally, also driven pseudo-randomly, ftest01 will throw in a call to ftruncate(), truncate(), sync(), or fstat(). With the fstat() calls, the returned .st_size is compared with the test's expected size for the file, and an error is declared if they don't match. What's happening is that, some way into the test, this fstat() check is failing. Specifically, the .st_size reported by fstat() is greater than the computed size. The sequence of operations leading up to this is: lseek 1034240 0 read 2048 lseek 0 1 write 2048 lseek 638976 0 (read, lseek, write) lseek 708608 0 (read, lseek, write) lseek 708608 0 (read, lseek, write) lseek 679584 0 (read, lseek, write) truncate 266240 lseek 960512 0 (read, lseek, write) (a bunch of lseek/read/lseek/write ops that do not extend the file) fstat So the expected size of the file is 960512 + 2048 == 960560. But the fstat reports a size of 1036288. A look at what's happening on the wire, distilled from the output of tethereal, is instructive. READ Call 638976 4096 (byte offset and size to read) READ Reply 4096 995382 (bytes read and current file size) SETATTR Call 266240 (this corresponds to the truncate() call) WRITE Call 638976 4096 (byte offset and size to write) WRITE Call 708608 4096 WRITE Call 1032192 4096 SETATTR Reply 266240 (current size of file) WRITE Reply 643072 (current size of file after write) WRITE Reply 1036288 WRITE Reply 1036288 GETATTR (initiated internally by NFS code?) READ Call 958464 4096 READ Reply 4096 1036288 ... (a bunch of READ and WRITE ops that do not extend the file) GETATTR Call (this corresponds to the fstat() call) GETATTR Reply 1036288 So what appears to have happened here is that three of the WRITE operations that the program issued before the truncate() call have "bled past" the SETATTR, extending the file further than the SETATTR did. Since none of the operations issued after SETATTR extends the file further, by the time we get to the GETATTR, the file is larger than the test expects. There are two strange things going on here. The first, identified above, is that write()s that were initiated before the truncate() call are being processed after the resulting SETATTR call. The second is that WRITE operations are being initiated while the SETATTR is outstanding. It seems to me that a size-changing SETATTR operation should act essentially as an I/O barrier. It should wait for all outstanding read/write requests to complete, then issue the SETATTR, wait for the reply, and only then re-enable read/write requests. In other words, SETATTR should be atomic with respect to other I/O operations. A git bisect indicates that this problem first appeared (or was first uncovered) with this commit: 4f8ad65 writeback: Refactor writeback_single_inode() It continues to the most recent mainline kernels. NFS v3 vs. v4 doesn't seem to matter. Has anyone else seen this? Any pointers you can provide? Thanks, Dan Duval Oracle Corp. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html