On Mon, Aug 13, 2012 at 1:20 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > On Tue, 2012-08-07 at 00:25 -0700, Michel Lespinasse wrote: >> a faster worst-case complexity of O(k+log N) for stabbing queries in a >> well-balanced prio tree, vs O(k*log N) for interval trees (where k=number >> of matches, N=number of intervals). Now this sounds great, but in practice >> prio trees don't realize this theorical benefit. First, the additional >> constraint makes them harder to update, so that the kernel implementation >> has to simplify things by balancing them like a radix tree, which is not >> always ideal. > > Not something spending a great deal of time on, but do you have any idea > what the radix like balancing does the the worst case stabbing > complexity? Well, it's not terribly bad as the height of the radix tree is still bounded by (IIRC) log(ULONG_MAX) + log(max pgoff_t in the file). So the complexity ends up being O(k + some constant) where the constant is in practice always larger than log(N). One can still construct cases where a prio tree stabbing query would read less nodes (so touch less cache lines) than with augmented rbtree. I think the best way to explain this is to look at what augmented rbtree does - for a stabbing query returning k matches, it visits all nodes between the root and the k matching nodes (plus one extra path at the end to conclude that there are no further matches). So the worst case there would be if the matching nodes are all close to leaf level in the tree, and if the paths leading to them branch up close to the root level. It's easy to construct such a case by making all N nodes in the tree start to the left of the stabbing query, and having only an arbitrary k among them end to the right of the query. Prio tree achieves a better worst case complexity by placing these k nodes near the top of the tree (due to the heap constraint). But this worst case situation isn't very likely to begin with, and in the typical case the k paths leading to the matching nodes in an augmented rbtree don't branch up so close to the root so typically prio tree doesn't get an advantage there. To sum it up, I would say that both algorithms are really pretty good, with prio tree getting a slightly better theorical worst-case and augmented rbtree getting (IMO) a practical advantage due to lower constant factors and a still very acceptable worst case. > Anyway, I like the thing, that prio-tree code always made my head hurt. Black magic I tell you ;-) -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>