* Paul E. McKenney (paulmck@xxxxxxxxxxxxxxxxxx) wrote: > On Sun, May 29, 2011 at 01:01:04PM -0400, Mathieu Desnoyers wrote: > > * Mathieu Desnoyers (mathieu.desnoyers@xxxxxxxxxxxx) wrote: > > > * Sasha Levin (levinsasha928@xxxxxxxxx) wrote: > > [...] > > > > Hi Mathieu! > > > > > > > > In tools/kvm/ we use a rb-tree (same one used by the kernel) with the > > > > augmentation feature to support an interval rb-tree - which means that > > > > every update to the tree not only updates the nodes directly related to > > > > the updated node but also all the nodes on the path to the root of the > > > > tree. > > > > > > Cool !! > > > > > > I'm adding in copy Phil Howard who has been working on RCU RB tree for > > > much longer than myself. > > > > > > > I see that in liburcu there is an implementation of a rcu linked list > > > > but no implementation of a rb-tree. > > > > > > > > Are you currently working on one? or maybe I should try writing one and > > > > sending it to you? > > > > > > Actually, I started working on one last year, but had to interrupt my > > > effort before I got it even working right. > > [...] > > > We'd have to see how we can go from this implementation of a standard RB > > > tree to an interval RB tree too. I guess it will depend whether you need > > > the updates from the target node up to the root to be done "all at once" > > > from a reader perspective (then you would probably need to replace a > > > copy of a part of the tree all at once), or if you can allow the update > > > to be done piece-wise on a node-by-node basis as readers go through the > > > tree (from root to leafs). > > > > I've revisited the RCU rbtree implementation this weekend, and it works > > much better now. I reimplemented the whole thing from 0 starting from > > the CLRS chapter 12 algorithms (to get the non-rcu > > (insertion/removal)-only stress-tests working) and incrementally > > RCU-ized the updates and then added read-side tests. All along, I used > > the test_urcu_rbtree test case that does some basic coherency tests by > > searching for some random elements that *should* be there in parellel > > with insertion and removals. The implementation I currently have > > survives the "search for known elements in parallel with updates" stress > > test (so far). (e.g. test_urcu_rbtree 6 2 10 -g 30 : 6 readers, 2 > > writers, 30 known random elements, writers are adding/removing 6 random > > elements, on a 8-core machine) > > > > See: git://git.lttng.org/userspace-rcu.git > > branch : rbtree2 > > > > The key idea I used in this implementation is to "decay" the old nodes > > (AFAIK, I just made this up) : "decaying" a node could be best described > > as creating an exact copy of a node, and putting a pointer to this new > > node into the old node to form a "decay chain". This allowed me to keep > > the algorithm very much similar to CLRS by just walking the decay chains > > whenever needed. The old node "decays" by using call_rcu to free it > > after a grace period passes. This imply that the updates must hold the > > RCU read-side lock in addition to a mutex to make sure the decaying > > nodes stay valid for the duration of their use. > > > > This implementation never requires the read-side to loop, thus > > guaranteeing a wait-free read-side behavior (so search operations will > > always be strictly log(n) without any busy-loop delay). > > > > I have not created stress-tests for next/prev walk of the tree yet. It > > is therefore entirely possible that this does not work as expected. > > > > Comments are welcome, > > Very cool! > > The trick Phil Howard used allowed him to avoid duplicating the nodes > in some cases in the rotations. I might be missing something, but it > looks like you are duplicating in all cases. The duplications I do are (following CLRS 3rd ed. chap 12, 13): - x, y and beta for left and right rotation (p. 313) - v for transplant (p. 296) - the whole branch between z.right and y (inclusive) for lines 9--20 of rb_delete() (p. 324, chap. 13) (at most log(n) items), for the case I call rcu_rbtree_remove_nonil() in my code. > Would using Phil's trick > result in significant performance gain? I just read through Phil's paper at http://www.cs.pdx.edu/pdfs/tr1006.pdf. It looks like we have different targets: Phil's structure of RB tree is heavily tuned to allow RCU search, but it uses a RW lock for in-order traversal. Mine allows both search and in-order traversal to be performed under RCU read-side. One impact of my different goal is that I need to keep pointers to parent nodes (and must know if a node is a left or right child) -- and update both of these atomically. E.g., at least one optimisation done in Phil's work would not work with my scheme (his optimized swap, 4.1.2): it generates an intermediate tree state where in-order traversal could loop between C -> B -> A -> C (trying to do multiple rcu_rbtree_next) for a while which goes against the time guarantees I want to provide. Thanks, Mathieu > > Thanx, Paul > > > Thanks, > > > > Mathieu > > > > > > > > > > Thanks, > > > > > > Mathieu > > > > > > > > > -- > > > Mathieu Desnoyers > > > Operating System Efficiency R&D Consultant > > > EfficiOS Inc. > > > http://www.efficios.com > > > > -- > > Mathieu Desnoyers > > Operating System Efficiency R&D Consultant > > EfficiOS Inc. > > http://www.efficios.com -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html