Re: RCU red-black tree (was: Re: [PATCH 4/6] kvm tools: Add rwlock wrapper)

Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> · Sun, 29 May 2011 22:54:15 -0400

* Sasha Levin (levinsasha928@xxxxxxxxx) wrote:
> On Sun, 2011-05-29 at 13:01 -0400, Mathieu Desnoyers wrote:
> > * Mathieu Desnoyers (mathieu.desnoyers@xxxxxxxxxxxx) wrote:
> > > * Sasha Levin (levinsasha928@xxxxxxxxx) wrote:
> > [...]
> > > > Hi Mathieu!
> > > > 
> > > > In tools/kvm/ we use a rb-tree (same one used by the kernel) with the
> > > > augmentation feature to support an interval rb-tree - which means that
> > > > every update to the tree not only updates the nodes directly related to
> > > > the updated node but also all the nodes on the path to the root of the
> > > > tree.
> > > 
> > > Cool !!
> > > 
> > > I'm adding in copy Phil Howard who has been working on RCU RB tree for
> > > much longer than myself.
> > > 
> > > > I see that in liburcu there is an implementation of a rcu linked list
> > > > but no implementation of a rb-tree.
> > > > 
> > > > Are you currently working on one? or maybe I should try writing one and
> > > > sending it to you?
> > > 
> > > Actually, I started working on one last year, but had to interrupt my
> > > effort before I got it even working right.
> > [...]
> > > We'd have to see how we can go from this implementation of a standard RB
> > > tree to an interval RB tree too. I guess it will depend whether you need
> > > the updates from the target node up to the root to be done "all at once"
> > > from a reader perspective (then you would probably need to replace a
> > > copy of a part of the tree all at once), or if you can allow the update
> > > to be done piece-wise on a node-by-node basis as readers go through the
> > > tree (from root to leafs).
> > 
> > I've revisited the RCU rbtree implementation this weekend, and it works
> > much better now. I reimplemented the whole thing from 0 starting from
> > the CLRS chapter 12 algorithms (to get the non-rcu
> > (insertion/removal)-only stress-tests working) and incrementally
> 
> Hi Mathieu!
> 
> Can't we use the rbtree provided by the kernel, and just add _rcu
> functions to provide rcu protected rbtree?

Typical RB trees expect mutual exclusion between writers and readers,
and between writers. AFAIK, the in-kernel rbtree implementation has
this requirement.  "RCU-izing" the RBtree structure requires more than
just adding some functions: its whole structure must be adapted to
support concurrent reader activity while updates are performed.

> 
> > RCU-ized the updates and then added read-side tests. All along, I used
> > the test_urcu_rbtree test case that does some basic coherency tests by
> > searching for some random elements that *should* be there in parellel
> > with insertion and removals. The implementation I currently have
> > survives the "search for known elements in parallel with updates" stress
> > test (so far). (e.g.  test_urcu_rbtree 6 2 10 -g 30 : 6 readers, 2
> > writers, 30 known random elements, writers are adding/removing 6 random
> > elements, on a 8-core machine)
> 
> I've grabbed it and it works great for me here too :)
> test_urcu_rbtree does print a lot of mess, making it somewhat hard to
> work with.

I've now removed the "debug" printfs from the rbtree2 head.

> 
> > 
> > See: git://git.lttng.org/userspace-rcu.git
> >      branch : rbtree2
> > 
> > The key idea I used in this implementation is to "decay" the old nodes
> > (AFAIK, I just made this up) : "decaying" a node could be best described
> > as creating an exact copy of a node, and putting a pointer to this new
> > node into the old node to form a "decay chain". This allowed me to keep
> > the algorithm very much similar to CLRS by just walking the decay chains
> > whenever needed. The old node "decays" by using call_rcu to free it
> > after a grace period passes. This imply that the updates must hold the
> > RCU read-side lock in addition to a mutex to make sure the decaying
> > nodes stay valid for the duration of their use.
> > 
> > This implementation never requires the read-side to loop, thus
> > guaranteeing a wait-free read-side behavior (so search operations will
> > always be strictly log(n) without any busy-loop delay).
> > 
> > I have not created stress-tests for next/prev walk of the tree yet. It
> > is therefore entirely possible that this does not work as expected.

I just added some min + next and max + prev walk stress tests into
test_urcu_rbtree. They seem to work fine so far with 6 readers + 2
writers running concurrently with 50 random items (it catched a stupid
bug in prev(), which I fixed immediately). The idea behind these stress
tests is to walk forward or backward over the entire tree and flag the
position that corresponds to the global array of values in an array of
flags. At the end, it checks that all the values that "must" be there
were indeed flagged.

> 
> I've 'forked' tools/kvm/, and started working on integrating urcu into
> it (Will be located in https://github.com/levinsasha/linux-kvm once some
> progress has been made), this should make it easier to review
> work-in-progress. I'll switch to the rbtree2 branch in urcu and work
> with it from now.

Cool :)

Please note that what I currently have is a normal rbtree, not an
interval rbtree. Can you elaborate on your use-case so I can try to
figure out how we could augment it to support the interval rbtree you
need ?

Thanks,

Mathieu

> 
> > Comments are welcome,
> > 
> > Thanks,
> > 
> > Mathieu
> > 
> > 
> > > 
> > > Thanks,
> > > 
> > > Mathieu
> > > 
> > > 
> > > -- 
> > > Mathieu Desnoyers
> > > Operating System Efficiency R&D Consultant
> > > EfficiOS Inc.
> > > http://www.efficios.com
> > 
> 
> -- 
> 
> Sasha.
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html