On Friday February 20, bfields@xxxxxxxxxxxx wrote: > On Sat, Feb 21, 2009 at 06:47:58AM +1100, Neil Brown wrote: > > > > I cannot see how there is a regression here. Subsequent getattrs will > > show all modifications (if you wait at least one second). > > The first gettattr returns '-0.000000001', which is different from any > > previously returned mtime. > > Any subsequent getattr will return 0.999999999 or 1, depending on when > > it arrives. > > Sorry, I guess I misread "smallest difference that can be reported by > the protocol" as "smallest difference supported by the filesystem"! > > The former is currently *always* smaller than the latter, so you're > reporting an mtime that will never arise in any other way. So you're > right, this results in strictly more cache revalidations in every case. > > It may turn out that this mtime-1 case ends up being the typical case, > since a single logical file modification may appear as multiple writes > on the server, and those are likely to come in rapid succession. When you look at a file that is changing, yes. When you look at a file that hasn't changed for a while, you just get normal mtime. > > A "make" that takes less than one second, on an ext3 export, may result > in targets with earlier mtimes than sources. No. If they were both changed in the last second, they will both have 1 nanosecond subtracted from the mtime, so they will still look like they have the same mtime. > > (Why not mtime+1? And why not ctime?) Not "mtime + 1" because mtime should normally be monotonically increasing. If you touch a file into the future, then when that moment comes, its mtime will jump backwards 1 nanosecond, then jump forwards again. But I don't think that is a problem. Why not what ctime? I agree with Trond that we need to apply this adjustment to ctime too. But we cannot set mtime == ctime. That would just be wrong. > > > The only possible regression is that sometimes we will flush the cache > > when previously we didn't. In each case where that changes, the > > client can not possible know whether it needs to or not, so flushing > > rather than not flushing is the safest option. > > > > > > > > By the way, I have one sadly neglected todo here: ext4 has a real nfsv4 > > > changeattribute, which needs to be hooked up to the nfsd code. > > > > Does it? > > I just had a quick look, found that it stores a 64 bit number on disk > > which is stored in inode->i_version. > > And this is incremented for directory operations. But it doesn't seem > > to be changed for file operations. > > > > But maybe I missed something. > > After some mucking around with git and git grep... looks like the > inode_inc_iversion() calls do the job. Note there's an i_version mount > option that's required. Ahh.... Now if only they had called that function "inode_inc_i_version", then my grep would have found it... So in nfsd we can simply do: if (IS_I_VERSION(fhp->fh_dentry->d_inode)) { change_attribute = fhp->fh_dentry->d_inode->i_version; } else { change_attribute = kstat.ctime; } ?? NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html