Re: [PATCH 6/10] fuse: Trust kernel i_size only

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Pavel Emelyanov <xemul@xxxxxxxxxxxxx> writes:

> On 07/04/2012 06:39 PM, Miklos Szeredi wrote:
>> Pavel Emelyanov <xemul@xxxxxxxxxxxxx> writes:
>> 
>>> Make fuse think that when writeback is on the inode's i_size is alway
>>> up-to-date and not update it with the value received from the
>>> userspace. This is done because the page cache code may update i_size
>>> without letting the FS know.
>> 
>> Similar rule applies to i_mtime.  Except it's even more tricky, since
>> you have to flush the mtime together with the data (the flush should not
>> update the modification time).  And we have other operations which also
>> change the mtime, and those also need to be updated.
>> 
>> We should probably look at what NFS is doing.
>
> Miklos,
>
> I've looked at how NFS and FUSE manage the i_mtime.
>
> The NFS (if not looking at the fscache) tries to keep the i_mtime at the
> maximum between the local and the remote values. This looks like correct
> behavior for FUSE too.
>
> But, looking at the FUSE code I see that the existing attr_version checks
> do implement the same approach even if we turn the writeback cache (this
> series) ON.

It's not as simple as that.

First off, fuse sets S_NOCMTIME which means the kernel won't update
i_mtime on writes.  Which is fine for write-through, since the time
update is always the responsibility of the userspace filesystem.

If we are doing buffered writes, then the kernel must update i_mtime on
write(2) and must flush that to the userspace filesystem at some point
(with a SETTATTR operation).

The attr_version thing is about making sure that we don't update the
kernel's cached attribute with stale data from the userspace filesystem.
E.g. if a GETATTR races with a WRITE then by the time the GETATTR
finishes write may have already updated i_size despite the fact that
GETATTR is returning i_size before the write.  In that case we don't
store the i_size we believe is stale but we do use it the stat(2) reply,
since it came from the userspace filesystem, which is the authoritative
source of information.

But that means that if we want stat(2) to work correctly, then we must
flush the updated i_mtime to userspace before we do the GETATTR.

So basically what we need is a per-inode flag that says that i_mtime has
been updated (it is more recent then what userspace has) and we must
update i_mtime *only* in write and not other operations which still do
the mtime update in the userspace filesystem.  Any operation that
modifies i_mtime (and hence invalidate the attributes) must clear the
flag.  Any other operation which updates or invalidates the attributes
must first flush the i_mtime to userspace if the flag is set.

In addition the userspace fileystem need to implement the policy similar
to NFS, in which it only updates mtime if it is greater than the current
one.  This means that we must differentiate between an mtime update due
to a buffered write from an mtime update due to an utime (and friends)
system call.

I think that would work, but it's a tricky thing and needs to be thought
trough.

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux