Re: [PATCH 6/10] fuse: Trust kernel i_size only

Pavel Emelyanov <xemul@xxxxxxxxxxxxx> · Mon, 16 Jul 2012 07:32:44 +0400

On 07/13/2012 08:30 PM, Miklos Szeredi wrote:
> Pavel Emelyanov <xemul@xxxxxxxxxxxxx> writes:
> 
>> On 07/04/2012 06:39 PM, Miklos Szeredi wrote:
>>> Pavel Emelyanov <xemul@xxxxxxxxxxxxx> writes:
>>>
>>>> Make fuse think that when writeback is on the inode's i_size is alway
>>>> up-to-date and not update it with the value received from the
>>>> userspace. This is done because the page cache code may update i_size
>>>> without letting the FS know.
>>>
>>> Similar rule applies to i_mtime.  Except it's even more tricky, since
>>> you have to flush the mtime together with the data (the flush should not
>>> update the modification time).  And we have other operations which also
>>> change the mtime, and those also need to be updated.
>>>
>>> We should probably look at what NFS is doing.
>>
>> Miklos,
>>
>> I've looked at how NFS and FUSE manage the i_mtime.
>>
>> The NFS (if not looking at the fscache) tries to keep the i_mtime at the
>> maximum between the local and the remote values. This looks like correct
>> behavior for FUSE too.
>>
>> But, looking at the FUSE code I see that the existing attr_version checks
>> do implement the same approach even if we turn the writeback cache (this
>> series) ON.
> 
> It's not as simple as that.
> 
> First off, fuse sets S_NOCMTIME which means the kernel won't update
> i_mtime on writes.  Which is fine for write-through, since the time
> update is always the responsibility of the userspace filesystem.

Oops, I've missed that fact :( I wonder why writes did update the mtime
in my tests then...

> If we are doing buffered writes, then the kernel must update i_mtime on
> write(2) and must flush that to the userspace filesystem at some point
> (with a SETTATTR operation).

Agree.

> The attr_version thing is about making sure that we don't update the
> kernel's cached attribute with stale data from the userspace filesystem.
> E.g. if a GETATTR races with a WRITE then by the time the GETATTR
> finishes write may have already updated i_size despite the fact that
> GETATTR is returning i_size before the write.  In that case we don't
> store the i_size we believe is stale but we do use it the stat(2) reply,
> since it came from the userspace filesystem, which is the authoritative
> source of information.
> 
> But that means that if we want stat(2) to work correctly, then we must
> flush the updated i_mtime to userspace before we do the GETATTR.

Why? Isn't it enough just to look at the per-inode flag (you're talking
below) and take either cached i_mtime value or the user-space verion of it?

> So basically what we need is a per-inode flag that says that i_mtime has
> been updated (it is more recent then what userspace has) and we must
> update i_mtime *only* in write and not other operations which still do
> the mtime update in the userspace filesystem.  Any operation that
> modifies i_mtime (and hence invalidate the attributes) must clear the
> flag.  Any other operation which updates or invalidates the attributes
> must first flush the i_mtime to userspace if the flag is set.
> 
> In addition the userspace fileystem need to implement the policy similar
> to NFS, in which it only updates mtime if it is greater than the current
> one.  This means that we must differentiate between an mtime update due
> to a buffered write from an mtime update due to an utime (and friends)
> system call.
> 
> I think that would work, but it's a tricky thing and needs to be thought
> trough.
> 
> Thanks,
> Miklos
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html