Re: [PATCH] radix_tree_tag_get() is not as safe as the docs make out [ver #2]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 06, 2010 at 10:36:20PM +0100, David Howells wrote:
> radix_tree_tag_get() is not safe to use concurrently with radix_tree_tag_set()
> or radix_tree_tag_clear().  The problem is that the double tag_get() in
> radix_tree_tag_get():
> 
> 		if (!tag_get(node, tag, offset))
> 			saw_unset_tag = 1;
> 		if (height == 1) {
> 			int ret = tag_get(node, tag, offset);
> 
> may see the value change due to the action of set/clear.  RCU is no protection
> against this as no pointers are being changed, no nodes are being replaced
> according to a COW protocol - set/clear alter the node directly.
> 
> The documentation in linux/radix-tree.h, however, says that
> radix_tree_tag_get() is an exception to the rule that "any function modifying
> the tree or tags (...) must exclude other modifications, and exclude any
> functions reading the tree".
> 
> The problem is that the next statement in radix_tree_tag_get() checks that the
> tag doesn't vary over time:
> 
> 			BUG_ON(ret && saw_unset_tag);

Yep, it's just a simple bug because the pagecache hadn't been using
radix_tree_tag_get.

> 
> This has been seen happening in FS-Cache:
> 
> 	https://www.redhat.com/archives/linux-cachefs/2010-April/msg00013.html
> 
> To this end, remove the BUG_ON() from radix_tree_tag_get() and note in various
> comments that the value of the tag may change whilst the RCU read lock is held,

I was thinking more of adding it to the radix-tree.h comment, because
this is true of all radix-tree functions when called under RCU. I don't
know it is considered "unreliable", it is just that concurrent
modifications may run unless the caller has excluded them. So the
value returned is reliably a value that has been observed.

Thanks for spotting this bug.

> and thus that the return value of radix_tree_tag_get() may not be relied upon
> unless radix_tree_tag_set/clear() and radix_tree_delete() are excluded from
> running concurrently with it.
> 
> Reported-by: Romain DEGEZ <romain.degez@xxxxxxxxxxxx>
> Signed-off-by: David Howells <dhowells@xxxxxxxxxx>
> ---
> 
>  include/linux/radix-tree.h |    7 +++++++
>  lib/radix-tree.c           |   12 ++++++------
>  2 files changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index c5da749..55ca73c 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -121,6 +121,13 @@ do {									\
>   * (Note, rcu_assign_pointer and rcu_dereference are not needed to control
>   * access to data items when inserting into or looking up from the radix tree)
>   *
> + * Note that the value returned by radix_tree_tag_get() may not be relied upon
> + * if only the RCU read lock is held.  Functions to set/clear tags and to
> + * delete nodes running concurrently with it may affect its result such that
> + * two consecutive reads in the same locked section may return different
> + * values.  If reliability is required, modification functions must also be
> + * excluded from concurrency.
> + *
>   * radix_tree_tagged is able to be called without locking or RCU.
>   */
>  
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index 6b9670d..6a75327 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -556,6 +556,10 @@ EXPORT_SYMBOL(radix_tree_tag_clear);
>   *
>   *  0: tag not present or not set
>   *  1: tag set
> + *
> + * Note that the return value of this function may not be relied on, even if
> + * the RCU lock is held, unless tag modification and node deletion are excluded
> + * from concurrency.
>   */
>  int radix_tree_tag_get(struct radix_tree_root *root,
>  			unsigned long index, unsigned int tag)
> @@ -596,12 +600,8 @@ int radix_tree_tag_get(struct radix_tree_root *root,
>  		 */
>  		if (!tag_get(node, tag, offset))
>  			saw_unset_tag = 1;
> -		if (height == 1) {
> -			int ret = tag_get(node, tag, offset);
> -
> -			BUG_ON(ret && saw_unset_tag);
> -			return !!ret;
> -		}
> +		if (height == 1)
> +			return !!tag_get(node, tag, offset);
>  		node = rcu_dereference_raw(node->slots[offset]);
>  		shift -= RADIX_TREE_MAP_SHIFT;
>  		height--;
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux