On Fri, Aug 20, 2010 at 03:22:07PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > Commit ebf8aa44beed48cd17893a83d92a4403e5f9d9e2 ("radix-tree: > omplement function radix_tree_range_tag_if_tagged") does not safely > set tags on on intermediate tree nodes. The code walks down the tree > setting tags before it has fully resolved the path to the leaf under > the assumption there will be a leaf slot with the tag set in the > range it is searching. > > Unfortunately, this is not a valid assumption - we can abort after > setting a tag on an intermediate node if we overrun the number of > tags we are allowed to set in a batch, or stop scanning because we > we have passed the last scan index before we reach a leaf slot with > the tag we are searching for set. > > As a result, we can leave the function with tags set on intemediate > nodes which can be tripped over later by tag-based lookups. The > result of these stale tags is that lookup may end prematurely or > livelock because the lookup cannot make progress. > > The fix for the problem involves reocrding the traversal path we > take to the leaf nodes, and only propagating the tags back up the > tree once the tag is set in the leaf node slot. We are already > recording the path for efficient traversal, so there is no > additional overhead to do the intermediately node tag setting in > this manner. > > This fixes a radix tree lookup livelock triggered by the new > writeback sync livelock avoidance code introduced in commit > f446daaea9d4a420d16c606f755f3689dcb2d0ce ("mm: implement writeback > livelock avoidance using page tagging"). It seems OK to me, thanks. > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > --- > lib/radix-tree.c | 57 +++++++++++++++++++++++++++++++++++++++++------------ > 1 files changed, 44 insertions(+), 13 deletions(-) > > diff --git a/lib/radix-tree.c b/lib/radix-tree.c > index 1014171..e0ee8cb 100644 > --- a/lib/radix-tree.c > +++ b/lib/radix-tree.c > @@ -625,6 +625,13 @@ EXPORT_SYMBOL(radix_tree_tag_get); > * also settag. The function stops either after tagging nr_to_tag items or > * after reaching last_index. > * > + * The tags must be set from the leaf level only and propagated back up the > + * path to the root. We must do this so that we resolve the full path before > + * setting any tags on intermediate nodes. If we set tags as we descend, then > + * we can get to the leaf node and find that the index that has the iftag > + * set is outside the range we are scanning. This reults in dangling tags and > + * can lead to problems with later tag operations (e.g. livelocks on lookups). > + * > * The function returns number of leaves where the tag was set and sets > * *first_indexp to the first unscanned index. > */ > @@ -633,9 +640,13 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root, > unsigned long nr_to_tag, > unsigned int iftag, unsigned int settag) > { > - unsigned int height = root->height, shift; > - unsigned long tagged = 0, index = *first_indexp; > - struct radix_tree_node *open_slots[height], *slot; > + unsigned int height = root->height; > + struct radix_tree_path path[height]; > + struct radix_tree_path *pathp = path; > + struct radix_tree_node *slot; > + unsigned int shift; > + unsigned long tagged = 0; > + unsigned long index = *first_indexp; > > last_index = min(last_index, radix_tree_maxindex(height)); > if (index > last_index) > @@ -655,6 +666,13 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root, > shift = (height - 1) * RADIX_TREE_MAP_SHIFT; > slot = radix_tree_indirect_to_ptr(root->rnode); > > + /* > + * we fill the path from (root->height - 2) to 0, leaving the index at > + * (root->height - 1) as a terminator. Zero the node in the terminator > + * so that we can use this to end walk loops back up the path. > + */ > + path[height - 1].node = NULL; > + > for (;;) { > int offset; > > @@ -663,17 +681,30 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root, > goto next; > if (!tag_get(slot, iftag, offset)) > goto next; > + if (height > 1) { > + /* Go down one level */ > + height--; > + shift -= RADIX_TREE_MAP_SHIFT; > + path[height - 1].node = slot; > + path[height - 1].offset = offset; > + slot = slot->slots[offset]; > + continue; > + } > + > + /* tag the leaf */ > + tagged++; > tag_set(slot, settag, offset); > - if (height == 1) { > - tagged++; > - goto next; > + > + /* walk back up the path tagging interior nodes */ > + pathp = &path[0]; > + while (pathp->node) { > + /* stop if we find a node with the tag already set */ > + if (tag_get(pathp->node, settag, pathp->offset)) > + break; > + tag_set(pathp->node, settag, pathp->offset); > + pathp++; > } > - /* Go down one level */ > - height--; > - shift -= RADIX_TREE_MAP_SHIFT; > - open_slots[height] = slot; > - slot = slot->slots[offset]; > - continue; > + > next: > /* Go to next item at level determined by 'shift' */ > index = ((index >> shift) + 1) << shift; > @@ -687,7 +718,7 @@ next: > * last_index is guaranteed to be in the tree, what > * we do below cannot wander astray. > */ > - slot = open_slots[height]; > + slot = path[height - 1].node; > height++; > shift += RADIX_TREE_MAP_SHIFT; > } > -- > 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html