Re: [PATCH 4/6] kvm tools: Add rwlock wrapper

Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> · Fri, 27 May 2011 09:14:00 -0400

* Sasha Levin (levinsasha928@xxxxxxxxx) wrote:
> On Thu, 2011-05-26 at 19:09 -0400, Mathieu Desnoyers wrote:
> > * Sasha Levin (levinsasha928@xxxxxxxxx) wrote:
> > > On Thu, 2011-05-26 at 21:21 +0300, Pekka Enberg wrote:
> > > > On Thu, May 26, 2011 at 9:11 PM, Avi Kivity <avi@xxxxxxxxxx> wrote:
> > > > > On 05/26/2011 09:05 PM, Ingo Molnar wrote:
> > > > >>
> > > > >> >
> > > > >> >  I've added some rwlocks because of what Ingo said yesterday about
> > > > >> >  adding/removing devices after the first initialization phase.
> > > > >> >
> > > > >> >  Take MMIO lock for example: Since we can now run SMP guests, we may
> > > > >> >  have multiple MMIO exits (one from each VCPU thread). Each of those
> > > > >> >  exits leads to searching the MMIO rbtree.
> > > > >> >
> > > > >> >  We can use a mutex to lock it, but it just means that those threads
> > > > >> >  will be blocked there instead of concurrently searching the MMIO
> > > > >> >  tree which makes the search linear instead of parallel.
> > > > >> >
> > > > >> >  It's hard to bring 'real' numbers at this stage because the only
> > > > >> >  'real' device we have which uses MMIO is the VESA driver, and we
> > > > >> >  can't really simulate many VCPUs writing to it :)
> > > > >>
> > > > >> I'd suggest keeping it simple first - rwlocks are nasty and will
> > > > >> bounce a cacheline just as much.
> > > > >
> > > > > Well, this is the first case where tools/kvm can do better than qemu with
> > > > > its global lock, so I think it's worth it.
> > > > >
> > > > >> If lookup scalability is an issue we can extend RCU to tools/kvm/.
> > > > >
> > > > > Definitely rcu is a perfect patch for mmio dispatch.
> > > > 
> > > > Userspace RCU code is here, Sasha, if you feel like tackling this:
> > > > 
> > > > http://lttng.org/urcu
> > > > 
> > > > :-)
> > > > 
> > > > I'm CC'ing Paul and Mathieu as well for urcu.
> > > 
> > > Sounds good!
> > > 
> > > Should be quite an addition and could be used in more places than just
> > > the MMIO dispatcher.
> > 
> > Hi Sasha,
> > 
> > Feel free to let me know if you need any help, have questions, or need
> > improvements to liburcu. I'd be interested to know about the specifics
> > of you read vs update workload rate. Also, if you need more thorough
> > information, we have a paper describing all the liburcu flavors. It
> > might help you choose the one better suited for your needs. (if you
> > don't care that much about fine-tuning, my recommendation is to stick
> > with the "urcu.h"/"liburcu" flavor). Link to the paper preprint can be
> > found at http://www.efficios.com/publications
> 
> Hi Mathieu!
> 
> In tools/kvm/ we use a rb-tree (same one used by the kernel) with the
> augmentation feature to support an interval rb-tree - which means that
> every update to the tree not only updates the nodes directly related to
> the updated node but also all the nodes on the path to the root of the
> tree.

Cool !!

I'm adding in copy Phil Howard who has been working on RCU RB tree for
much longer than myself.

> I see that in liburcu there is an implementation of a rcu linked list
> but no implementation of a rb-tree.
> 
> Are you currently working on one? or maybe I should try writing one and
> sending it to you?

Actually, I started working on one last year, but had to interrupt my
effort before I got it even working right. The state of this
(disclaimer: unfinished!!) work is available in the "rbtree" branch of
the urcu library. This tree has insertion/removals protected by a mutex,
and uses a RCU read lock to protect traversal. The main problem I was
facing when I had to stop working on that code is that the "nil" node:

  56 /* Sentinel (bottom nodes).
  57  * Don't care about p, left, right, pos and key values */
  58 struct rcu_rbtree_node rcu_rbtree_nil = {
  59         .color = COLOR_BLACK,
  60 };

ended up being written to as temporary node by the algorithm presented
in CLRS, chap. 12.  So sharing it as a common node, as proposed in their
book, made sense only if you consider you have no concurrency, but seems
to break left and right in the presence of concurrency, and I did not
have time to review their entire algo to find out where I should check
for accesses to this nil node.

This implementation is trying to think of the RB tree in terms of how
each operation is affecting the read-side visibility of the tree nodes.
It uses the fact that readers only ever go down into the tree
extensively.

I'd be glad to help out if someone want to have a look and try to
complete that work, which should only be considered as "work in
progress" level of (in)completeness.

We'd have to see how we can go from this implementation of a standard RB
tree to an interval RB tree too. I guess it will depend whether you need
the updates from the target node up to the root to be done "all at once"
from a reader perspective (then you would probably need to replace a
copy of a part of the tree all at once), or if you can allow the update
to be done piece-wise on a node-by-node basis as readers go through the
tree (from root to leafs).

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html