I'm currently experiencing an issue where reading from a file that is known to only contain ascii characters can result in the process reading that file seeing NULL bytes (0x0) in the file that never existed. This happens for NFSv3 clients when the file is being updated concurrently either by another NFSv3 client or directly on the NFS server. Its possible this issue is already known as I've found the following RHEL knowledge base article talking about it (unfortunately you need to be logged in to view the full content of the following link): https://access.redhat.com/solutions/22717 If this is already a known issue and there is relevant discussion of it, I would appreciate it if someone could point me to it as I couldn't find anything (but, its possible I just missed it). If it isn't a known issue, here are the details of how I've been reproducing: I have reproduced this on RHEL 6.5 (2.6.32.431.1.2.0.1.el6) as well as on Fedora 21 (kernel-3.18.7-200.fc21). In the RHEL 6.5 case, I am using two systems with the aforementioned kernel and an Isilon as the NFS server. In the Fedora 21 case, I am using just one system, acting as both client and server. The basic idea is to touch something like 10 files, then start the attached python program (tail.py) on them "tailing" them as they grow and finally start updating them either from another NFS client or from the server directly. Here is the general set of steps for reproducing using a single system: # mkdir -p /export/test # mkdir -p /mnt/test # echo '/export/test localhost(rw)' > /etc/exports # exportfs -fa (make sure nfsd is running, etc) # mount -t nfs -o vers=3 localhost:/export/test /mnt/test (start the tailers) # touch /mnt/test/file{00..09} # for f in /mnt/test/file{00..09}; do python tail.py ${f} & done && cat >/dev/null (start the updating) # for f in /export/test/file{00..09}; do while true; do echo 'abc' >> ${f}; sleep .1; done & done After a little while, you'll see output like the following from the "tailer" processes: file08: ['61', '62', '63', '0A', '61', '62', '63', '0A', '61', '62', '63', '0A', '61', '62', '63', '0A', '00', '00', '00', '00'] file04: ['61', '62', '63', '0A', '61', '62', '63', '0A', '61', '62', '63', '0A', '61', '62', '63', '0A', '00', '00', '00', '00'] file02: ['61', '62', '63', '0A', '61', '62', '63', '0A', '61', '62', '63', '0A', '61', '62', '63', '0A', '00', '00', '00', '00'] As you can see, there are NULL bytes at the end that never existed. I've done some analysis of this using SystemTap and am happy to share details if necessary. I believe the issue is due to concurrent execution of `do_generic_file_read' (by the process making the read syscall) and `nfs_update_inode' by rpciod. rpciod updates the inode size, but before it can invalidate the page cache, our process making the read syscall has already found the the page and is now using the vfs inode's size to determine how much data to copy out of the page back into userspace. It seems like since the VFS layer doesn't do any locking of the inode (or at least `do_generic_file_read' doesn't), then the fact that rpciod is updating the inode under the protection of `i_lock' doesn't help. My naive thought was to simply mark the page cache invalid before updating the inode's size in `nfs_update_inode', but I'm unsure if that would have other unwanted effects. Any thoughts or insights appreciated.
#!/usr/bin/env python import os import sys import stat import time class Tailer(object): def __init__(self, path, stat_delay=.5, bufsize=4096): self.path = path self.stat_delay = stat_delay self.bufsize = bufsize self.pos = 0 def size(self): with open(self.path, 'r') as f: fd = f.fileno() return os.fstat(fd)[stat.ST_SIZE] def read(self): with open(self.path, 'r') as f: fd = f.fileno() os.lseek(fd, self.pos, os.SEEK_SET) data = "" while True: tmp = os.read(fd, self.bufsize) if tmp == "": break else: data += tmp self.pos += len(tmp) return data def run(self): while True: file_size = self.size() if file_size > self.pos: data = self.read() if "\x00" in data: print "%s: %r" % (self.path, ["%02X" % ord(x) for x in data]) break time.sleep(self.stat_delay) def main(): tailer = Tailer(sys.argv[1], bufsize=32768) tailer.run() if __name__ == "__main__": main()