On 3/12/24 19:18, Sweet Tea Dorminy wrote: > > > On 3/11/24 06:01, Miklos Szeredi wrote: >> On Thu, 7 Mar 2024 at 16:10, Sweet Tea Dorminy >> <sweettea-kernel@xxxxxxxxxx> wrote: >>> >>> All calls into generic vfs functions need to make sure that the inode >>> attributes used by those functions are up to date, by calling >>> fuse_update_attributes() as appropriate. >>> >>> generic_write_checks() accesses inode size in order to get the >>> appropriate file offset for files opened with O_APPEND. Currently, in >>> some cases, fuse_update_attributes() is not called before >>> generic_write_checks(), potentially resulting in corruption/overwrite of >>> previously appended data if i_size is out of date in the cached inode. >> >> While this all sounds good, I don't think it makes sense. >> >> Why? Because doing cached O_APPEND writes without any sort of >> exclusion with remote writes is just not going to work. >> >> Either the server ignores the current size and writes at the offset >> that the kernel supplied (which will be the cached size of the file) >> and executes the write at that position, or it appends the write to >> the current EOF. In the former case the cache will be consistent, but >> append semantics are not observed, while in the latter case the append >> semantics are observed, but the cache will be inconsistent. >> >> Solution: either exclude remote writes or don't use the cache. >> >> Updating the file size before the write does not prevent the race, >> only makes the window smaller. > > Definitely agree with you. > > The usecase at hand is a sort of NFS-like network filesystem, where > there's exclusion of remote writes while the file is open, but no > problem with remote writes while the file is closed. > > The alternative we considered was to add a fuse_update_attributes() call > to open. > > We thought about doing so during d_revalidate/lookup_fast(). But as far > as I understand, lookup_fast() is not just called during open, and will > use the cached inode if the dentry timeout hasn't expired. We tried > setting dentry timeout to 0, but that lost too much performance. So that > didn't seem to work. > > But updating attributes after giving the filesystem a chance to > invalidate them during open() would work, I think? You mean something like this? diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index d19cbf34c634..2723270323d9 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -204,7 +204,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) if (inode && fuse_is_bad(inode)) goto invalid; else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) || - (flags & (LOOKUP_EXCL | LOOKUP_REVAL | LOOKUP_RENAME_TARGET))) { + (flags & (LOOKUP_EXCL | LOOKUP_REVAL | LOOKUP_RENAME_TARGET | LOOKUP_OPEN))) { struct fuse_entry_out outarg; FUSE_ARGS(args); struct fuse_forget_link *forget; I think this would make sense and could be caught by the atomic-open/revalidate (once I get back to it). > > That would also conveniently fix the issue where copy_file_range() > currently checks the size before calling into fuse at all, which I'd > been building a more elaborate changeset for. > > How does that sound? > > Thanks! > > Sweet Tea >