Re: [PATCH v2] vfs: Don't exchange "short" filenames unconditionally.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Sep 28, 2014 at 07:05:56PM +0100, Al Viro wrote:
> On Sun, Sep 28, 2014 at 08:47:47AM +0100, Al Viro wrote:
> 
> > The root cause, of course, is that we delay decrementing the refcount on
> > dentry_free() path...  One variant is to rip freeing these suckers out of
> > __d_free() and have dentry_free() do atomic_dec_and_test + kfree_rcu.
> > That works (and we don't need to set DCACHE_RCUACCESS in copy_name();
> > the interesting part of never-hashed case is for short names anyway), but
> > the cost is twice the number of rcu callbacks on freeing long-name dentries.
> > Is that really painful enough to care?  If not, something like the following
> > would do; if it is... well, we could probably do make free_dentry() mark
> > dentry->d_flags for __d_free() with "free the external name hanging off
> > this one" in case if refcount decrement has ended up with zero.  Doable,
> > but more complicated (and somewhat messy, since ->d_lock is not held
> > at that point; OTOH, we could play with ->d_name instead of ->d_flags...).
> > Anyway, the delta below might suffice.  Comments?
> 
> Now that I've got some sleep...  It's easy to avoid extra call_rcu, of
> course - just have two variants of __d_free(), one freeing the external
> name (always called via call_rcu()) and another just freeing the dentry
> itself.

Assuming that incrementing the external name's reference count is
atomic_add_unless, I could believe this part.  Or if you have some
locking that makes it impossible to increment the reference count
in any case where there is any risk that it might be decremented
to zero, I guess.

Which might well be the pair of write_seqcount_begin() calls in __d_move(),
now that I look more carefully.  So if the name has to be in use somewhere
before it can be copied, then a copy can only be created if there is at
least one copy that is not currently being removed?  If so, OK.

>          So free_dentry() would
> 	* decrement the refcount on external name and if that has reached
> zero - call_rcu the full variant.
> 	* if there had been no external name or if its refcount has not
> reached zero - call_rcu the variant that just frees dentry itself or
> just do a direct call of the same if dentry has never been RCU-visible.
> 
> One thing that worries me is the barriers that might be needed on assignments
> to ->d_name.name.  We should be no worse than we are right now - either RCU
> accessors are careful enough with ACCESS_ONCE() and everything's fine,
> or they are not, in which case we already have a bug in mainline - swapping
> ->d_name followed by dput() and freeing of target is no better than
> copying ->d_name from target to source followed by kfree_rcu() of what
> used to be ->d_name.name of source.  IOW, if RCU lookup could pick
> a value of ->d_name.name that got obsolete by d_move() happening
> before our read_lock_rcu(), we would be in trouble anyway - it might
> already have had its freeing RCU-scheduled and thus not delayed
> by read_lock_rcu() done afterwards.

Yep!  Of course, the usual advice is to wait until the structure is
inaccessible to readers before starting the grace period.

>                                      So I think the patch below doesn't
> introduce new problems of that sort, but I'd really appreciate if RCU
> people would take a look at the situation with barriers in that area.
> Are those ACCESS_ONCE() in dentry_cmp() and prepend_name() enough, or
> do we need some barriers in switch_names() as well?
> 
> Folks, care to review and test the following?
> 
> Allow sharing external names after __d_move()
> 
> * external dentry names get a small structure prepended to them
> (struct ext_name).
> * it contains an atomic refcount, matching the number of struct dentry
> instances that have ->d_name.name pointing to that ext name.  The
> first thing free_dentry() does is decrementing refcount of ext name,
> so the instances between the call of free_dentry() and RCU-delayed
> actual freeing do not contribute.
> * __d_move(x, y, false) makes the name of x equal to the name of y,
> external or not.  If y has an external name, extra reference is grabbed
> and put into x->d_name.name.  If x used to have an external name, the
> reference to it is dropped and, should it reach zero, freeing is scheduled
> via kfree_rcu().
> 
> All non-RCU accesses to dentry ext name are safe wrt freeing since they
> all should happen before free_dentry() is called.  RCU accesses might run
> into a dentry seen by free_dentry() or into an old name that got already
> dropped by __d_move(); however, in both cases dentry must have been
> alive and refer to that name at some point after we'd done rcu_read_lock(),
> which means that any freeing must be still pending.
> 
> Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> ---
> diff --git a/fs/dcache.c b/fs/dcache.c
> index cb25a1a..0cf5aff 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -235,20 +235,44 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
>  	return dentry_string_cmp(cs, ct, tcount);
>  }
> 
> +struct ext_name {
> +	union {
> +		atomic_t count;
> +		struct rcu_head head;
> +	};

The usual trick to simplify handling string lengths is to put the
length in the same structure as the name.  Probably doesn't play well
with your swap operation, though.  I cannot find the code to which
this patch applies, but vaguely recall something about nul termination.

> +	unsigned char name[];
> +};
> +
> +static inline struct ext_name *ext_name(struct dentry *dentry)
> +{
> +	if (likely(!dname_external(dentry)))
> +		return NULL;
> +	return container_of(dentry->d_name.name, struct ext_name, name[0]);
> +}
> +
>  static void __d_free(struct rcu_head *head)
>  {
>  	struct dentry *dentry = container_of(head, struct dentry, d_u.d_rcu);
> 
>  	WARN_ON(!hlist_unhashed(&dentry->d_alias));
> -	if (dname_external(dentry))
> -		kfree(dentry->d_name.name);
> +	kmem_cache_free(dentry_cache, dentry); 
> +}
> +
> +static void __d_free_ext(struct rcu_head *head)
> +{
> +	struct dentry *dentry = container_of(head, struct dentry, d_u.d_rcu);
> +	WARN_ON(!hlist_unhashed(&dentry->d_alias));
> +	kfree(ext_name(dentry));
>  	kmem_cache_free(dentry_cache, dentry); 
>  }
> 
>  static void dentry_free(struct dentry *dentry)
>  {
> +	struct ext_name *p = ext_name(dentry);
> +	if (unlikely(p) && likely(atomic_dec_and_test(&p->count)))
> +		call_rcu(&dentry->d_u.d_rcu, __d_free_ext);
>  	/* if dentry was never visible to RCU, immediate free is OK */
> -	if (!(dentry->d_flags & DCACHE_RCUACCESS))
> +	else if (!(dentry->d_flags & DCACHE_RCUACCESS))
>  		__d_free(&dentry->d_u.d_rcu);
>  	else
>  		call_rcu(&dentry->d_u.d_rcu, __d_free);
> @@ -1438,11 +1462,14 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
>  	 */
>  	dentry->d_iname[DNAME_INLINE_LEN-1] = 0;
>  	if (name->len > DNAME_INLINE_LEN-1) {
> -		dname = kmalloc(name->len + 1, GFP_KERNEL);
> -		if (!dname) {
> +		size_t size = offsetof(struct ext_name, name) + name->len + 1;
> +		struct ext_name *p = kmalloc(size, GFP_KERNEL);
> +		if (!p) {
>  			kmem_cache_free(dentry_cache, dentry); 
>  			return NULL;
>  		}
> +		atomic_set(&p->count, 1);
> +		dname = p->name;
>  	} else  {
>  		dname = dentry->d_iname;
>  	}	
> @@ -2372,11 +2399,10 @@ void dentry_update_name_case(struct dentry *dentry, struct qstr *name)
>  }
>  EXPORT_SYMBOL(dentry_update_name_case);
> 
> -static void switch_names(struct dentry *dentry, struct dentry *target,
> -			 bool exchange)
> +static void swap_names(struct dentry *dentry, struct dentry *target)
>  {
> -	if (dname_external(target)) {
> -		if (dname_external(dentry)) {
> +	if (unlikely(dname_external(target))) {
> +		if (unlikely(dname_external(dentry))) {
>  			/*
>  			 * Both external: swap the pointers
>  			 */
> @@ -2392,7 +2418,7 @@ static void switch_names(struct dentry *dentry, struct dentry *target,
>  			target->d_name.name = target->d_iname;
>  		}
>  	} else {
> -		if (dname_external(dentry)) {
> +		if (unlikely(dname_external(dentry))) {
>  			/*
>  			 * dentry:external, target:internal.  Give dentry's
>  			 * storage to target and make dentry internal
> @@ -2407,12 +2433,6 @@ static void switch_names(struct dentry *dentry, struct dentry *target,
>  			 */
>  			unsigned int i;
>  			BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
> -			if (!exchange) {
> -				memcpy(dentry->d_iname, target->d_name.name,
> -						target->d_name.len + 1);
> -				dentry->d_name.hash_len = target->d_name.hash_len;
> -				return;
> -			}
>  			for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
>  				swap(((long *) &dentry->d_iname)[i],
>  				     ((long *) &target->d_iname)[i]);
> @@ -2422,6 +2442,22 @@ static void switch_names(struct dentry *dentry, struct dentry *target,
>  	swap(dentry->d_name.hash_len, target->d_name.hash_len);
>  }
> 
> +static void copy_name(struct dentry *dentry, struct dentry *target)
> +{
> +	struct ext_name *old_name = ext_name(dentry);
> +	if (unlikely(dname_external(target))) {
> +		atomic_inc(&ext_name(target)->count);
> +		dentry->d_name = target->d_name;
> +	} else {
> +		memcpy(dentry->d_iname, target->d_name.name,
> +				target->d_name.len + 1);
> +		dentry->d_name.name = dentry->d_iname;
> +		dentry->d_name.hash_len = target->d_name.hash_len;
> +	}
> +	if (unlikely(old_name) && likely(atomic_dec_and_test(&old_name->count)))
> +		kfree_rcu(old_name, head);
> +}
> +
>  static void dentry_lock_for_move(struct dentry *dentry, struct dentry *target)
>  {
>  	/*
> @@ -2518,7 +2554,10 @@ static void __d_move(struct dentry *dentry, struct dentry *target,
>  	}
> 
>  	/* Switch the names.. */
> -	switch_names(dentry, target, exchange);
> +	if (exchange)
> +		swap_names(dentry, target);
> +	else
> +		copy_name(dentry, target);
> 
>  	/* ... and switch them in the tree */
>  	if (IS_ROOT(dentry)) {
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux