Hi everyone, This is another iteration of my skiplists patch, with some bugs fixed and a few missing parts of the API done. The biggest change is in the insertion code, where I now link from the bottom up once I find the proper insertion point. This makes it possible to write custom insertion routines that allow duplicate keys. It also means insertion doesn't have to lock and track all the nodes from the top as it searches. In the likely event that we're able to insert into a free spot in an existing leaf, it only needs to take one lock. The IOMMU part of the patch is updated slightly, but still not using all the new bits in the API. This is mostly because the IOMMU part is going to change later on and I'm really only using it for testing now. For benchmarking, the IOMMU numbers are slightly higher than last time, but not more than 5% or so. This is because the bottleneck is still skiplist_insert_hole(), which I haven't really changed in this round. Most of my benchmarking now is with the skiplist_test module, which has expanded to a few new cases. For random operations, my performance is slightly slower than rbtree when single threaded. Random lookups on 10 million items: (inserts are similar) rbtree random lookup time 4 s 327 ms skiplist-rcu random lookup time 5 s 175 ms The new API makes it easier to do sequential operations. Here is a walk through the 10 million item list: skiplist-rcu check time 0 s 79 ms rbtree check time 0 s 295 ms And sequential insert: kiplist-rcu fill time 1 s 599 ms rbtree fill time 1 s 875 ms The benchmark does random lookup/deletion/insertion rounds. Most of the operations done are either deletion or insertion, so I'm not skewing the results with the easy rcu lookup operation. Single threaded rbtree does consistently win across all sizes of lists: skiplist-rcu thread 0 time 0 s 287 ms rbtree thread 0 time 0 s 215 ms But once we go up to two threads, skiplists win: skiplist-rcu thread 1 time 0 s 299 ms rbtree thread 1 time 0 s 499 ms At 8 threads, the skiplists don't get linear scaling, but it's really pretty good. Since I'm doing random operations on a wide key space, the locking skiplist variant is also fast: skiplist-rcu thread 7 time 0 s 379 ms skiplist-locking thread 7 time 0 s 425 ms rbtree thread 7 time 3 s 711 ms My test machine is 8 cores, so at 16 threads we're into HT: skiplist-rcu thread 15 time 0 s 583 ms skiplist-locking thread 15 time 0 s 559 ms rbtree thread 15 time 7 s 423 ms It's not all sunshine though. If I shrink the keyspace down to 1000 keys or less, there is more contention on the node locks and we're tied (or slightly worse) than rbtree. In that kind of workload it makes sense to add a big fat lock around the skiplist, or just stick with rbtrees. The skiplists do have locking and rcu variants of lookup operations. The locking ones are built on top of the rcu ones, and you can use them both at the same time. Patches follow. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html