On 07/13/2012 08:30 PM, Miklos Szeredi wrote: > Pavel Emelyanov <xemul@xxxxxxxxxxxxx> writes: > >> On 07/04/2012 06:39 PM, Miklos Szeredi wrote: >>> Pavel Emelyanov <xemul@xxxxxxxxxxxxx> writes: >>> >>>> Make fuse think that when writeback is on the inode's i_size is alway >>>> up-to-date and not update it with the value received from the >>>> userspace. This is done because the page cache code may update i_size >>>> without letting the FS know. >>> >>> Similar rule applies to i_mtime. Except it's even more tricky, since >>> you have to flush the mtime together with the data (the flush should not >>> update the modification time). And we have other operations which also >>> change the mtime, and those also need to be updated. >>> >>> We should probably look at what NFS is doing. >> >> Miklos, >> >> I've looked at how NFS and FUSE manage the i_mtime. >> >> The NFS (if not looking at the fscache) tries to keep the i_mtime at the >> maximum between the local and the remote values. This looks like correct >> behavior for FUSE too. >> >> But, looking at the FUSE code I see that the existing attr_version checks >> do implement the same approach even if we turn the writeback cache (this >> series) ON. > > It's not as simple as that. > > First off, fuse sets S_NOCMTIME which means the kernel won't update > i_mtime on writes. Which is fine for write-through, since the time > update is always the responsibility of the userspace filesystem. Oops, I've missed that fact :( I wonder why writes did update the mtime in my tests then... > If we are doing buffered writes, then the kernel must update i_mtime on > write(2) and must flush that to the userspace filesystem at some point > (with a SETTATTR operation). Agree. > The attr_version thing is about making sure that we don't update the > kernel's cached attribute with stale data from the userspace filesystem. > E.g. if a GETATTR races with a WRITE then by the time the GETATTR > finishes write may have already updated i_size despite the fact that > GETATTR is returning i_size before the write. In that case we don't > store the i_size we believe is stale but we do use it the stat(2) reply, > since it came from the userspace filesystem, which is the authoritative > source of information. > > But that means that if we want stat(2) to work correctly, then we must > flush the updated i_mtime to userspace before we do the GETATTR. Why? Isn't it enough just to look at the per-inode flag (you're talking below) and take either cached i_mtime value or the user-space verion of it? > So basically what we need is a per-inode flag that says that i_mtime has > been updated (it is more recent then what userspace has) and we must > update i_mtime *only* in write and not other operations which still do > the mtime update in the userspace filesystem. Any operation that > modifies i_mtime (and hence invalidate the attributes) must clear the > flag. Any other operation which updates or invalidates the attributes > must first flush the i_mtime to userspace if the flag is set. > > In addition the userspace fileystem need to implement the policy similar > to NFS, in which it only updates mtime if it is greater than the current > one. This means that we must differentiate between an mtime update due > to a buffered write from an mtime update due to an utime (and friends) > system call. > > I think that would work, but it's a tricky thing and needs to be thought > trough. > > Thanks, > Miklos > . > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html