Pavel Emelyanov <xemul@xxxxxxxxxxxxx> writes: > On 07/04/2012 06:39 PM, Miklos Szeredi wrote: >> Pavel Emelyanov <xemul@xxxxxxxxxxxxx> writes: >> >>> Make fuse think that when writeback is on the inode's i_size is alway >>> up-to-date and not update it with the value received from the >>> userspace. This is done because the page cache code may update i_size >>> without letting the FS know. >> >> Similar rule applies to i_mtime. Except it's even more tricky, since >> you have to flush the mtime together with the data (the flush should not >> update the modification time). And we have other operations which also >> change the mtime, and those also need to be updated. >> >> We should probably look at what NFS is doing. > > Miklos, > > I've looked at how NFS and FUSE manage the i_mtime. > > The NFS (if not looking at the fscache) tries to keep the i_mtime at the > maximum between the local and the remote values. This looks like correct > behavior for FUSE too. > > But, looking at the FUSE code I see that the existing attr_version checks > do implement the same approach even if we turn the writeback cache (this > series) ON. It's not as simple as that. First off, fuse sets S_NOCMTIME which means the kernel won't update i_mtime on writes. Which is fine for write-through, since the time update is always the responsibility of the userspace filesystem. If we are doing buffered writes, then the kernel must update i_mtime on write(2) and must flush that to the userspace filesystem at some point (with a SETTATTR operation). The attr_version thing is about making sure that we don't update the kernel's cached attribute with stale data from the userspace filesystem. E.g. if a GETATTR races with a WRITE then by the time the GETATTR finishes write may have already updated i_size despite the fact that GETATTR is returning i_size before the write. In that case we don't store the i_size we believe is stale but we do use it the stat(2) reply, since it came from the userspace filesystem, which is the authoritative source of information. But that means that if we want stat(2) to work correctly, then we must flush the updated i_mtime to userspace before we do the GETATTR. So basically what we need is a per-inode flag that says that i_mtime has been updated (it is more recent then what userspace has) and we must update i_mtime *only* in write and not other operations which still do the mtime update in the userspace filesystem. Any operation that modifies i_mtime (and hence invalidate the attributes) must clear the flag. Any other operation which updates or invalidates the attributes must first flush the i_mtime to userspace if the flag is set. In addition the userspace fileystem need to implement the policy similar to NFS, in which it only updates mtime if it is greater than the current one. This means that we must differentiate between an mtime update due to a buffered write from an mtime update due to an utime (and friends) system call. I think that would work, but it's a tricky thing and needs to be thought trough. Thanks, Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html