On 2/17/25 11:07, Luis Henriques wrote: > On Mon, Feb 17 2025, Bernd Schubert wrote: > >> On 2/16/25 17:50, Luis Henriques wrote: >>> Currently userspace is able to notify the kernel to invalidate the cache >>> for an inode. This means that, if all the inodes in a filesystem need to >>> be invalidated, then userspace needs to iterate through all of them and do >>> this kernel notification separately. >>> >>> This patch adds a new option that allows userspace to invalidate all the >>> inodes with a single notification operation. In addition to invalidate >>> all the inodes, it also shrinks the sb dcache. >>> >>> Signed-off-by: Luis Henriques <luis@xxxxxxxxxx> >>> --- >>> fs/fuse/inode.c | 33 +++++++++++++++++++++++++++++++++ >>> include/uapi/linux/fuse.h | 3 +++ >>> 2 files changed, 36 insertions(+) >>> >>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c >>> index e9db2cb8c150..01a4dc5677ae 100644 >>> --- a/fs/fuse/inode.c >>> +++ b/fs/fuse/inode.c >>> @@ -547,6 +547,36 @@ struct inode *fuse_ilookup(struct fuse_conn *fc, u64 nodeid, >>> return NULL; >>> } >>> >>> +static int fuse_reverse_inval_all(struct fuse_conn *fc) >>> +{ >>> + struct fuse_mount *fm; >>> + struct inode *inode; >>> + >>> + inode = fuse_ilookup(fc, FUSE_ROOT_ID, &fm); >>> + if (!inode || !fm) >>> + return -ENOENT; >>> + >>> + /* Remove all possible active references to cached inodes */ >>> + shrink_dcache_sb(fm->sb); >>> + >>> + /* Remove all unreferenced inodes from cache */ >>> + invalidate_inodes(fm->sb); >>> + >>> + return 0; >>> +} >>> + >>> +/* >>> + * Notify to invalidate inodes cache. It can be called with @nodeid set to >>> + * either: >>> + * >>> + * - An inode number - Any pending writebacks within the rage [@offset @len] >>> + * will be triggered and the inode will be validated. To invalidate the whole >>> + * cache @offset has to be set to '0' and @len needs to be <= '0'; if @offset >>> + * is negative, only the inode attributes are invalidated. >>> + * >>> + * - FUSE_INVAL_ALL_INODES - All the inodes in the superblock are invalidated >>> + * and the whole dcache is shrinked. >>> + */ >>> int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, >>> loff_t offset, loff_t len) >>> { >>> @@ -555,6 +585,9 @@ int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, >>> pgoff_t pg_start; >>> pgoff_t pg_end; >>> >>> + if (nodeid == FUSE_INVAL_ALL_INODES) >>> + return fuse_reverse_inval_all(fc); >>> + >>> inode = fuse_ilookup(fc, nodeid, NULL); >>> if (!inode) >>> return -ENOENT; >>> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h >>> index 5e0eb41d967e..e5852b63f99f 100644 >>> --- a/include/uapi/linux/fuse.h >>> +++ b/include/uapi/linux/fuse.h >>> @@ -669,6 +669,9 @@ enum fuse_notify_code { >>> FUSE_NOTIFY_CODE_MAX, >>> }; >>> >>> +/* The nodeid to request to invalidate all inodes */ >>> +#define FUSE_INVAL_ALL_INODES 0 >>> + >>> /* The read buffer is required to be at least 8k, but may be much larger */ >>> #define FUSE_MIN_READ_BUFFER 8192 >>> >> >> >> I think this version might end up in >> >> static void fuse_evict_inode(struct inode *inode) >> { >> struct fuse_inode *fi = get_fuse_inode(inode); >> >> /* Will write inode on close/munmap and in all other dirtiers */ >> WARN_ON(inode->i_state & I_DIRTY_INODE); >> >> >> if the fuse connection has writeback cache enabled. >> >> >> Without having it tested, reproducer would probably be to run >> something like passthrough_hp (without --direct-io), opening >> and writing to a file and then sending FUSE_INVAL_ALL_INODES. > > Thanks, Bernd. So far I couldn't trigger this warning. But I just found > that there's a stupid bug in the code: a missing iput() after doing the > fuse_ilookup(). > > I'll spend some more time trying to understand how (or if) the warning you > mentioned can triggered before sending a new revision. > Maybe I'm wrong, but it calls invalidate_inodes() dispose_list() evict(inode) fuse_evict_inode() and if at the same time something writes to inode page cache, the warning would be triggered? There are some conditions in evict, like inode_wait_for_writeback() that might protect us, but what is if it waited and then just in the right time the another write comes and dirties the inode again? Thanks, Bernd