On 22/07/14 04:49PM, Jan Kara wrote: > On Thu 14-07-22 17:45:32, Ritesh Harjani wrote: > > On 22/07/12 12:54PM, Jan Kara wrote: > > > Add function mb_cache_entry_delete_or_get() to delete mbcache entry if > > > it is unused and also add a function to wait for entry to become unused > > > - mb_cache_entry_wait_unused(). We do not share code between the two > > > deleting function as one of them will go away soon. > > > > > > CC: stable@xxxxxxxxxxxxxxx > > > Fixes: 82939d7999df ("ext4: convert to mbcache2") > > > Signed-off-by: Jan Kara <jack@xxxxxxx> > > > --- > > > fs/mbcache.c | 66 +++++++++++++++++++++++++++++++++++++++-- > > > include/linux/mbcache.h | 10 ++++++- > > > 2 files changed, 73 insertions(+), 3 deletions(-) > > > > > > diff --git a/fs/mbcache.c b/fs/mbcache.c > > > index cfc28129fb6f..2010bc80a3f2 100644 > > > --- a/fs/mbcache.c > > > +++ b/fs/mbcache.c > > > @@ -11,7 +11,7 @@ > > > /* > > > * Mbcache is a simple key-value store. Keys need not be unique, however > > > * key-value pairs are expected to be unique (we use this fact in > > > - * mb_cache_entry_delete()). > > > + * mb_cache_entry_delete_or_get()). > > > * > > > * Ext2 and ext4 use this cache for deduplication of extended attribute blocks. > > > * Ext4 also uses it for deduplication of xattr values stored in inodes. > > > @@ -125,6 +125,19 @@ void __mb_cache_entry_free(struct mb_cache_entry *entry) > > > } > > > EXPORT_SYMBOL(__mb_cache_entry_free); > > > > > > +/* > > > + * mb_cache_entry_wait_unused - wait to be the last user of the entry > > > + * > > > + * @entry - entry to work on > > > + * > > > + * Wait to be the last user of the entry. > > > + */ > > > +void mb_cache_entry_wait_unused(struct mb_cache_entry *entry) > > > +{ > > > + wait_var_event(&entry->e_refcnt, atomic_read(&entry->e_refcnt) <= 3); > > > > It's not very intuitive of why we check for refcnt <= 3. > > A small note at top of this function might be helpful. > > IIUC, it is because by default when anyone creates an entry we start with > > a refcnt of 2 (in mb_cache_entry_create. > > - Now when the user of the entry wants to delete this, it will try and call > > mb_cache_entry_delete_or_get(). If during this function call it sees that the > > refcnt is elevated more than 2, that means there is another user of this entry > > currently active and hence we should wait before we remove this entry from the > > cache. So it will take an extra refcnt and return. > > - So then this caller will call mb_cache_entry_wait_unused() for the refcnt to > > be <= 3, so that the entry can be deleted. > > Correct. I will add a comment as you suggest. > > > Quick qn - > > So now is the design like, ext4_evict_ea_inode() will be waiting indefinitely > > until the other user of this mb_cache entry releases the reference right? > > Correct. Similarly for ext4_xattr_release_block(). > > > And that will not happen until, > > - either the shrinker removes this entry from the cache during which we are > > checking if the refcnt <= 3, then we call a wakeup event > > No, shrinker will not touch these entries with active users anymore. > > > - Or the user removes/deletes the xattr entry > > No. We hold reference to mbcache entry only while we are trying to reuse > it. So functions ext4_xattr_block_cache_find() and > ext4_xattr_inode_cache_find() will lookup potential mbcache entry that may > have the same contents and get reference to it. Then we do comparisons > verifying whether the contents really matches, if yes, we increment on-disk > inode/block refcount. Then we drop mbcache entry reference which unblocks > waiters in mb_cache_entry_wait_unused(). > ohk, yes. This is where I was a bit confused. Thanks for explaining it. This makes more sense. I did go through the mbcache implementation, but I was missing the info on how the callers are using it. -ritesh > Honza > > -- > Jan Kara <jack@xxxxxxxx> > SUSE Labs, CR