* Sasha Levin (levinsasha928@xxxxxxxxx) wrote: > On Thu, 2011-05-26 at 19:09 -0400, Mathieu Desnoyers wrote: > > * Sasha Levin (levinsasha928@xxxxxxxxx) wrote: > > > On Thu, 2011-05-26 at 21:21 +0300, Pekka Enberg wrote: > > > > On Thu, May 26, 2011 at 9:11 PM, Avi Kivity <avi@xxxxxxxxxx> wrote: > > > > > On 05/26/2011 09:05 PM, Ingo Molnar wrote: > > > > >> > > > > >> > > > > > >> > I've added some rwlocks because of what Ingo said yesterday about > > > > >> > adding/removing devices after the first initialization phase. > > > > >> > > > > > >> > Take MMIO lock for example: Since we can now run SMP guests, we may > > > > >> > have multiple MMIO exits (one from each VCPU thread). Each of those > > > > >> > exits leads to searching the MMIO rbtree. > > > > >> > > > > > >> > We can use a mutex to lock it, but it just means that those threads > > > > >> > will be blocked there instead of concurrently searching the MMIO > > > > >> > tree which makes the search linear instead of parallel. > > > > >> > > > > > >> > It's hard to bring 'real' numbers at this stage because the only > > > > >> > 'real' device we have which uses MMIO is the VESA driver, and we > > > > >> > can't really simulate many VCPUs writing to it :) > > > > >> > > > > >> I'd suggest keeping it simple first - rwlocks are nasty and will > > > > >> bounce a cacheline just as much. > > > > > > > > > > Well, this is the first case where tools/kvm can do better than qemu with > > > > > its global lock, so I think it's worth it. > > > > > > > > > >> If lookup scalability is an issue we can extend RCU to tools/kvm/. > > > > > > > > > > Definitely rcu is a perfect patch for mmio dispatch. > > > > > > > > Userspace RCU code is here, Sasha, if you feel like tackling this: > > > > > > > > http://lttng.org/urcu > > > > > > > > :-) > > > > > > > > I'm CC'ing Paul and Mathieu as well for urcu. > > > > > > Sounds good! > > > > > > Should be quite an addition and could be used in more places than just > > > the MMIO dispatcher. > > > > Hi Sasha, > > > > Feel free to let me know if you need any help, have questions, or need > > improvements to liburcu. I'd be interested to know about the specifics > > of you read vs update workload rate. Also, if you need more thorough > > information, we have a paper describing all the liburcu flavors. It > > might help you choose the one better suited for your needs. (if you > > don't care that much about fine-tuning, my recommendation is to stick > > with the "urcu.h"/"liburcu" flavor). Link to the paper preprint can be > > found at http://www.efficios.com/publications > > Hi Mathieu! > > In tools/kvm/ we use a rb-tree (same one used by the kernel) with the > augmentation feature to support an interval rb-tree - which means that > every update to the tree not only updates the nodes directly related to > the updated node but also all the nodes on the path to the root of the > tree. Cool !! I'm adding in copy Phil Howard who has been working on RCU RB tree for much longer than myself. > I see that in liburcu there is an implementation of a rcu linked list > but no implementation of a rb-tree. > > Are you currently working on one? or maybe I should try writing one and > sending it to you? Actually, I started working on one last year, but had to interrupt my effort before I got it even working right. The state of this (disclaimer: unfinished!!) work is available in the "rbtree" branch of the urcu library. This tree has insertion/removals protected by a mutex, and uses a RCU read lock to protect traversal. The main problem I was facing when I had to stop working on that code is that the "nil" node: 56 /* Sentinel (bottom nodes). 57 * Don't care about p, left, right, pos and key values */ 58 struct rcu_rbtree_node rcu_rbtree_nil = { 59 .color = COLOR_BLACK, 60 }; ended up being written to as temporary node by the algorithm presented in CLRS, chap. 12. So sharing it as a common node, as proposed in their book, made sense only if you consider you have no concurrency, but seems to break left and right in the presence of concurrency, and I did not have time to review their entire algo to find out where I should check for accesses to this nil node. This implementation is trying to think of the RB tree in terms of how each operation is affecting the read-side visibility of the tree nodes. It uses the fact that readers only ever go down into the tree extensively. I'd be glad to help out if someone want to have a look and try to complete that work, which should only be considered as "work in progress" level of (in)completeness. We'd have to see how we can go from this implementation of a standard RB tree to an interval RB tree too. I guess it will depend whether you need the updates from the target node up to the root to be done "all at once" from a reader perspective (then you would probably need to replace a copy of a part of the tree all at once), or if you can allow the update to be done piece-wise on a node-by-node basis as readers go through the tree (from root to leafs). Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html