On Mon 17-02-25 11:47:09, Luis Henriques wrote: > On Mon, Feb 17 2025, Bernd Schubert wrote: > > On 2/17/25 11:07, Luis Henriques wrote: > >> On Mon, Feb 17 2025, Bernd Schubert wrote: > >> > >>> On 2/16/25 17:50, Luis Henriques wrote: > >>>> Currently userspace is able to notify the kernel to invalidate the cache > >>>> for an inode. This means that, if all the inodes in a filesystem need to > >>>> be invalidated, then userspace needs to iterate through all of them and do > >>>> this kernel notification separately. > >>>> > >>>> This patch adds a new option that allows userspace to invalidate all the > >>>> inodes with a single notification operation. In addition to invalidate > >>>> all the inodes, it also shrinks the sb dcache. > >>>> > >>>> Signed-off-by: Luis Henriques <luis@xxxxxxxxxx> > >>>> --- > >>>> fs/fuse/inode.c | 33 +++++++++++++++++++++++++++++++++ > >>>> include/uapi/linux/fuse.h | 3 +++ > >>>> 2 files changed, 36 insertions(+) > >>>> > >>>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > >>>> index e9db2cb8c150..01a4dc5677ae 100644 > >>>> --- a/fs/fuse/inode.c > >>>> +++ b/fs/fuse/inode.c > >>>> @@ -547,6 +547,36 @@ struct inode *fuse_ilookup(struct fuse_conn *fc, u64 nodeid, > >>>> return NULL; > >>>> } > >>>> > >>>> +static int fuse_reverse_inval_all(struct fuse_conn *fc) > >>>> +{ > >>>> + struct fuse_mount *fm; > >>>> + struct inode *inode; > >>>> + > >>>> + inode = fuse_ilookup(fc, FUSE_ROOT_ID, &fm); > >>>> + if (!inode || !fm) > >>>> + return -ENOENT; > >>>> + > >>>> + /* Remove all possible active references to cached inodes */ > >>>> + shrink_dcache_sb(fm->sb); > >>>> + > >>>> + /* Remove all unreferenced inodes from cache */ > >>>> + invalidate_inodes(fm->sb); > >>>> + > >>>> + return 0; > >>>> +} > >>>> + > >>>> +/* > >>>> + * Notify to invalidate inodes cache. It can be called with @nodeid set to > >>>> + * either: > >>>> + * > >>>> + * - An inode number - Any pending writebacks within the rage [@offset @len] > >>>> + * will be triggered and the inode will be validated. To invalidate the whole > >>>> + * cache @offset has to be set to '0' and @len needs to be <= '0'; if @offset > >>>> + * is negative, only the inode attributes are invalidated. > >>>> + * > >>>> + * - FUSE_INVAL_ALL_INODES - All the inodes in the superblock are invalidated > >>>> + * and the whole dcache is shrinked. > >>>> + */ > >>>> int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, > >>>> loff_t offset, loff_t len) > >>>> { > >>>> @@ -555,6 +585,9 @@ int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, > >>>> pgoff_t pg_start; > >>>> pgoff_t pg_end; > >>>> > >>>> + if (nodeid == FUSE_INVAL_ALL_INODES) > >>>> + return fuse_reverse_inval_all(fc); > >>>> + > >>>> inode = fuse_ilookup(fc, nodeid, NULL); > >>>> if (!inode) > >>>> return -ENOENT; > >>>> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h > >>>> index 5e0eb41d967e..e5852b63f99f 100644 > >>>> --- a/include/uapi/linux/fuse.h > >>>> +++ b/include/uapi/linux/fuse.h > >>>> @@ -669,6 +669,9 @@ enum fuse_notify_code { > >>>> FUSE_NOTIFY_CODE_MAX, > >>>> }; > >>>> > >>>> +/* The nodeid to request to invalidate all inodes */ > >>>> +#define FUSE_INVAL_ALL_INODES 0 > >>>> + > >>>> /* The read buffer is required to be at least 8k, but may be much larger */ > >>>> #define FUSE_MIN_READ_BUFFER 8192 > >>>> > >>> > >>> > >>> I think this version might end up in > >>> > >>> static void fuse_evict_inode(struct inode *inode) > >>> { > >>> struct fuse_inode *fi = get_fuse_inode(inode); > >>> > >>> /* Will write inode on close/munmap and in all other dirtiers */ > >>> WARN_ON(inode->i_state & I_DIRTY_INODE); > >>> > >>> > >>> if the fuse connection has writeback cache enabled. > >>> > >>> > >>> Without having it tested, reproducer would probably be to run > >>> something like passthrough_hp (without --direct-io), opening > >>> and writing to a file and then sending FUSE_INVAL_ALL_INODES. > >> > >> Thanks, Bernd. So far I couldn't trigger this warning. But I just found > >> that there's a stupid bug in the code: a missing iput() after doing the > >> fuse_ilookup(). > >> > >> I'll spend some more time trying to understand how (or if) the warning you > >> mentioned can triggered before sending a new revision. > >> > > > > Maybe I'm wrong, but it calls > > > > invalidate_inodes() > > dispose_list() > > evict(inode) > > fuse_evict_inode() > > > > and if at the same time something writes to inode page cache, the > > warning would be triggered? > > There are some conditions in evict, like inode_wait_for_writeback() > > that might protect us, but what is if it waited and then just > > in the right time the another write comes and dirties the inode > > again? > > Right, I have looked into that too but my understanding is that this can > not happen because, before doing that wait, the code does: > > inode_sb_list_del(inode); > > and the inode state will include I_FREEING. > > Thus, before writing to it again, the inode will need to get added back to > the sb list. Also, reading the comments on evict(), if something writes > into the inode at that point that's likely a bug. But this is just my > understanding, and I may be missing something. Yes. invalidate_inodes() checks i_count == 0 and sets I_FREEING. Once I_FREEING is set nobody can acquire inode reference until the inode is fully destroyed. So nobody should be writing to the inode or anything like that. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR