On Thu 14-07-22 17:45:32, Ritesh Harjani wrote: > On 22/07/12 12:54PM, Jan Kara wrote: > > Add function mb_cache_entry_delete_or_get() to delete mbcache entry if > > it is unused and also add a function to wait for entry to become unused > > - mb_cache_entry_wait_unused(). We do not share code between the two > > deleting function as one of them will go away soon. > > > > CC: stable@xxxxxxxxxxxxxxx > > Fixes: 82939d7999df ("ext4: convert to mbcache2") > > Signed-off-by: Jan Kara <jack@xxxxxxx> > > --- > > fs/mbcache.c | 66 +++++++++++++++++++++++++++++++++++++++-- > > include/linux/mbcache.h | 10 ++++++- > > 2 files changed, 73 insertions(+), 3 deletions(-) > > > > diff --git a/fs/mbcache.c b/fs/mbcache.c > > index cfc28129fb6f..2010bc80a3f2 100644 > > --- a/fs/mbcache.c > > +++ b/fs/mbcache.c > > @@ -11,7 +11,7 @@ > > /* > > * Mbcache is a simple key-value store. Keys need not be unique, however > > * key-value pairs are expected to be unique (we use this fact in > > - * mb_cache_entry_delete()). > > + * mb_cache_entry_delete_or_get()). > > * > > * Ext2 and ext4 use this cache for deduplication of extended attribute blocks. > > * Ext4 also uses it for deduplication of xattr values stored in inodes. > > @@ -125,6 +125,19 @@ void __mb_cache_entry_free(struct mb_cache_entry *entry) > > } > > EXPORT_SYMBOL(__mb_cache_entry_free); > > > > +/* > > + * mb_cache_entry_wait_unused - wait to be the last user of the entry > > + * > > + * @entry - entry to work on > > + * > > + * Wait to be the last user of the entry. > > + */ > > +void mb_cache_entry_wait_unused(struct mb_cache_entry *entry) > > +{ > > + wait_var_event(&entry->e_refcnt, atomic_read(&entry->e_refcnt) <= 3); > > It's not very intuitive of why we check for refcnt <= 3. > A small note at top of this function might be helpful. > IIUC, it is because by default when anyone creates an entry we start with > a refcnt of 2 (in mb_cache_entry_create. > - Now when the user of the entry wants to delete this, it will try and call > mb_cache_entry_delete_or_get(). If during this function call it sees that the > refcnt is elevated more than 2, that means there is another user of this entry > currently active and hence we should wait before we remove this entry from the > cache. So it will take an extra refcnt and return. > - So then this caller will call mb_cache_entry_wait_unused() for the refcnt to > be <= 3, so that the entry can be deleted. Correct. I will add a comment as you suggest. > Quick qn - > So now is the design like, ext4_evict_ea_inode() will be waiting indefinitely > until the other user of this mb_cache entry releases the reference right? Correct. Similarly for ext4_xattr_release_block(). > And that will not happen until, > - either the shrinker removes this entry from the cache during which we are > checking if the refcnt <= 3, then we call a wakeup event No, shrinker will not touch these entries with active users anymore. > - Or the user removes/deletes the xattr entry No. We hold reference to mbcache entry only while we are trying to reuse it. So functions ext4_xattr_block_cache_find() and ext4_xattr_inode_cache_find() will lookup potential mbcache entry that may have the same contents and get reference to it. Then we do comparisons verifying whether the contents really matches, if yes, we increment on-disk inode/block refcount. Then we drop mbcache entry reference which unblocks waiters in mb_cache_entry_wait_unused(). Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR