On Tue, Feb 18 2025, Jan Kara wrote: > On Mon 17-02-25 11:47:09, Luis Henriques wrote: >> On Mon, Feb 17 2025, Bernd Schubert wrote: >> > On 2/17/25 11:07, Luis Henriques wrote: >> >> On Mon, Feb 17 2025, Bernd Schubert wrote: >> >> >> >>> On 2/16/25 17:50, Luis Henriques wrote: >> >>>> Currently userspace is able to notify the kernel to invalidate the cache >> >>>> for an inode. This means that, if all the inodes in a filesystem need to >> >>>> be invalidated, then userspace needs to iterate through all of them and do >> >>>> this kernel notification separately. >> >>>> >> >>>> This patch adds a new option that allows userspace to invalidate all the >> >>>> inodes with a single notification operation. In addition to invalidate >> >>>> all the inodes, it also shrinks the sb dcache. >> >>>> >> >>>> Signed-off-by: Luis Henriques <luis@xxxxxxxxxx> >> >>>> --- >> >>>> fs/fuse/inode.c | 33 +++++++++++++++++++++++++++++++++ >> >>>> include/uapi/linux/fuse.h | 3 +++ >> >>>> 2 files changed, 36 insertions(+) >> >>>> >> >>>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c >> >>>> index e9db2cb8c150..01a4dc5677ae 100644 >> >>>> --- a/fs/fuse/inode.c >> >>>> +++ b/fs/fuse/inode.c >> >>>> @@ -547,6 +547,36 @@ struct inode *fuse_ilookup(struct fuse_conn *fc, u64 nodeid, >> >>>> return NULL; >> >>>> } >> >>>> >> >>>> +static int fuse_reverse_inval_all(struct fuse_conn *fc) >> >>>> +{ >> >>>> + struct fuse_mount *fm; >> >>>> + struct inode *inode; >> >>>> + >> >>>> + inode = fuse_ilookup(fc, FUSE_ROOT_ID, &fm); >> >>>> + if (!inode || !fm) >> >>>> + return -ENOENT; >> >>>> + >> >>>> + /* Remove all possible active references to cached inodes */ >> >>>> + shrink_dcache_sb(fm->sb); >> >>>> + >> >>>> + /* Remove all unreferenced inodes from cache */ >> >>>> + invalidate_inodes(fm->sb); >> >>>> + >> >>>> + return 0; >> >>>> +} >> >>>> + >> >>>> +/* >> >>>> + * Notify to invalidate inodes cache. It can be called with @nodeid set to >> >>>> + * either: >> >>>> + * >> >>>> + * - An inode number - Any pending writebacks within the rage [@offset @len] >> >>>> + * will be triggered and the inode will be validated. To invalidate the whole >> >>>> + * cache @offset has to be set to '0' and @len needs to be <= '0'; if @offset >> >>>> + * is negative, only the inode attributes are invalidated. >> >>>> + * >> >>>> + * - FUSE_INVAL_ALL_INODES - All the inodes in the superblock are invalidated >> >>>> + * and the whole dcache is shrinked. >> >>>> + */ >> >>>> int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, >> >>>> loff_t offset, loff_t len) >> >>>> { >> >>>> @@ -555,6 +585,9 @@ int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, >> >>>> pgoff_t pg_start; >> >>>> pgoff_t pg_end; >> >>>> >> >>>> + if (nodeid == FUSE_INVAL_ALL_INODES) >> >>>> + return fuse_reverse_inval_all(fc); >> >>>> + >> >>>> inode = fuse_ilookup(fc, nodeid, NULL); >> >>>> if (!inode) >> >>>> return -ENOENT; >> >>>> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h >> >>>> index 5e0eb41d967e..e5852b63f99f 100644 >> >>>> --- a/include/uapi/linux/fuse.h >> >>>> +++ b/include/uapi/linux/fuse.h >> >>>> @@ -669,6 +669,9 @@ enum fuse_notify_code { >> >>>> FUSE_NOTIFY_CODE_MAX, >> >>>> }; >> >>>> >> >>>> +/* The nodeid to request to invalidate all inodes */ >> >>>> +#define FUSE_INVAL_ALL_INODES 0 >> >>>> + >> >>>> /* The read buffer is required to be at least 8k, but may be much larger */ >> >>>> #define FUSE_MIN_READ_BUFFER 8192 >> >>>> >> >>> >> >>> >> >>> I think this version might end up in >> >>> >> >>> static void fuse_evict_inode(struct inode *inode) >> >>> { >> >>> struct fuse_inode *fi = get_fuse_inode(inode); >> >>> >> >>> /* Will write inode on close/munmap and in all other dirtiers */ >> >>> WARN_ON(inode->i_state & I_DIRTY_INODE); >> >>> >> >>> >> >>> if the fuse connection has writeback cache enabled. >> >>> >> >>> >> >>> Without having it tested, reproducer would probably be to run >> >>> something like passthrough_hp (without --direct-io), opening >> >>> and writing to a file and then sending FUSE_INVAL_ALL_INODES. >> >> >> >> Thanks, Bernd. So far I couldn't trigger this warning. But I just found >> >> that there's a stupid bug in the code: a missing iput() after doing the >> >> fuse_ilookup(). >> >> >> >> I'll spend some more time trying to understand how (or if) the warning you >> >> mentioned can triggered before sending a new revision. >> >> >> > >> > Maybe I'm wrong, but it calls >> > >> > invalidate_inodes() >> > dispose_list() >> > evict(inode) >> > fuse_evict_inode() >> > >> > and if at the same time something writes to inode page cache, the >> > warning would be triggered? >> > There are some conditions in evict, like inode_wait_for_writeback() >> > that might protect us, but what is if it waited and then just >> > in the right time the another write comes and dirties the inode >> > again? >> >> Right, I have looked into that too but my understanding is that this can >> not happen because, before doing that wait, the code does: >> >> inode_sb_list_del(inode); >> >> and the inode state will include I_FREEING. >> >> Thus, before writing to it again, the inode will need to get added back to >> the sb list. Also, reading the comments on evict(), if something writes >> into the inode at that point that's likely a bug. But this is just my >> understanding, and I may be missing something. > > Yes. invalidate_inodes() checks i_count == 0 and sets I_FREEING. Once > I_FREEING is set nobody can acquire inode reference until the inode is > fully destroyed. So nobody should be writing to the inode or anything like > that. Awesome, it's good to have that confirmed. Thank you, Jan! Cheers, -- Luís