Re: [PATCH] fuse: update size attr before doing IO

Sweet Tea Dorminy <sweettea-kernel@xxxxxxxxxx> · Tue, 12 Mar 2024 19:08:09 -0400

On 3/12/24 17:55, Bernd Schubert wrote:


On 3/12/24 19:18, Sweet Tea Dorminy wrote:


On 3/11/24 06:01, Miklos Szeredi wrote:
On Thu, 7 Mar 2024 at 16:10, Sweet Tea Dorminy
<sweettea-kernel@xxxxxxxxxx> wrote:

All calls into generic vfs functions need to make sure that the inode
attributes used by those functions are up to date, by calling
fuse_update_attributes() as appropriate.

generic_write_checks() accesses inode size in order to get the
appropriate file offset for files opened with O_APPEND. Currently, in
some cases, fuse_update_attributes() is not called before
generic_write_checks(), potentially resulting in corruption/overwrite of
previously appended data if i_size is out of date in the cached inode.

While this all sounds good, I don't think it makes sense.

Why?  Because doing cached O_APPEND writes without any sort of
exclusion with remote writes is just not going to work.

Either the server ignores the current size and writes at the offset
that the kernel supplied (which will be the cached size of the file)
and executes the write at that position, or it appends the write to
the current EOF.  In the former case the cache will be consistent, but
append semantics are not observed, while in the latter case the append
semantics are observed, but the cache will be inconsistent.

Solution: either exclude remote writes or don't use the cache.

Updating the file size before the write does not prevent the race,
only makes the window smaller.

Definitely agree with you.

The usecase at hand is a sort of NFS-like network filesystem, where
there's exclusion of remote writes while the file is open, but no
problem with remote writes while the file is closed.

The alternative we considered was to add a fuse_update_attributes() call
to open.

We thought about doing so during d_revalidate/lookup_fast(). But as far
as I understand, lookup_fast() is not just called during open, and will
use the cached inode if the dentry timeout hasn't expired. We tried
setting dentry timeout to 0, but that lost too much performance. So that
didn't seem to work.

But updating attributes after giving the filesystem a chance to
invalidate them during open() would work, I think?

You mean something like this?

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index d19cbf34c634..2723270323d9 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -204,7 +204,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
         if (inode && fuse_is_bad(inode))
                 goto invalid;
         else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) ||
-                (flags & (LOOKUP_EXCL | LOOKUP_REVAL | LOOKUP_RENAME_TARGET))) {
+                (flags & (LOOKUP_EXCL | LOOKUP_REVAL | LOOKUP_RENAME_TARGET | LOOKUP_OPEN))) {
                 struct fuse_entry_out outarg;
                 FUSE_ARGS(args);
                 struct fuse_forget_link *forget;


I think this would make sense and could be caught by the atomic-open/revalidate
(once I get back to it).



That's an idea I like a lot, although I think that leaves open a window 
for a race and I can't see how to fix it easily.

If we passed the lookup_open flag down to the filesystem daemon, for it 
to exclusively lock the file, then we'd need a call to 
un-exclusively-lock it on an error later in the open process.

But if we don't do that and exclusively locked during fuse_send_open(), 
the window between lookup and fuse_send_open() would allow stat updates 
still before the file was exclusively locked.

Thanks,

Sweet Tea