Hi Chris, * Chris Mason (clmason@xxxxxxxxxxxx) wrote: > Hi everyone, > > This is another iteration of my skiplists patch, with some bugs fixed > and a few missing parts of the API done. The biggest change is in the > insertion code, where I now link from the bottom up once I find the > proper insertion point. This makes it possible to write custom > insertion routines that allow duplicate keys. > > It also means insertion doesn't have to lock and track all the nodes > from the top as it searches. In the likely event that we're able to > insert into a free spot in an existing leaf, it only needs to take one > lock. nice, I did pretty much the same for RCU Judy arrays. I made insertion and deletion take only the localized locks required, nesting from bottom to top. > The IOMMU part of the patch is updated slightly, but still not using all > the new bits in the API. This is mostly because the IOMMU part is going > to change later on and I'm really only using it for testing now. > > For benchmarking, the IOMMU numbers are slightly higher than last time, > but not more than 5% or so. This is because the bottleneck is still > skiplist_insert_hole(), which I haven't really changed in this round. > > Most of my benchmarking now is with the skiplist_test module, which has > expanded to a few new cases. For random operations, my performance is > slightly slower than rbtree when single threaded. > > Random lookups on 10 million items: (inserts are similar) > > rbtree random lookup time 4 s 327 ms > skiplist-rcu random lookup time 5 s 175 ms These benchmarks are only considering averages. I'm worried that the skiplist approach has chances to create long chains (probabilistically non-zero). Clearly, the lack of bound is something the RT guys might frown upon. I'd be curious to see how long the worse chains encountered can be on, say, 1000 machines stress-testing this for a week. FWIW, my benchmark of RCU Judy array with ranges in similar conditions: - Using 32-bit key length - I first populated 10M ranges of len = 1, sequentially. - Then, I run a reader threads for 10s, which perform random lookups (all successful) in keys from 0 to 10M. The average lookup time is (on a per-core basis): - single-thread: 522 ns / lookup ( 19122393 lookups in 10s) - 2 threads: 553 ns / lookup ( ~36134994 lookups in 10s on 2 cores) - 8 threads: 696 ns / lookup (~114854104 lookups in 10s on 8 cores) We get an efficiency of 0.79 when going from 2 to 8 threads (efficiency of 1 being linear scalability). Performed on this HW: 8-core model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz What I gather from your skiplist-rcu numbers, it seems to take 517ns/lookup (avg), and for rbtree: 432ns/lookup. I'd be curious to test all these alternatives in the same test bench. > > The new API makes it easier to do sequential operations. Here is a walk > through the 10 million item list: > > skiplist-rcu check time 0 s 79 ms > rbtree check time 0 s 295 ms > > And sequential insert: > > kiplist-rcu fill time 1 s 599 ms > rbtree fill time 1 s 875 ms > > The benchmark does random lookup/deletion/insertion rounds. Most of the > operations done are either deletion or insertion, so I'm not skewing the > results with the easy rcu lookup operation. We could do helpers for sequential traversal of ranges, they are not implemented in RCU Judy array yet though. > > Single threaded rbtree does consistently win across all sizes of lists: > > skiplist-rcu thread 0 time 0 s 287 ms > rbtree thread 0 time 0 s 215 ms Are the 287ms and 215ms numbers you get for skiplist-rcu and rbtree the total for 10M add/delete, or is it per operation, or is it a different number of operations altogether ? Also, how many items are present in the data structures in average ? > But once we go up to two threads, skiplists win: > > skiplist-rcu thread 1 time 0 s 299 ms > rbtree thread 1 time 0 s 499 ms > > At 8 threads, the skiplists don't get linear scaling, but it's really > pretty good. Since I'm doing random operations on a wide key space, > the locking skiplist variant is also fast: > > skiplist-rcu thread 7 time 0 s 379 ms > skiplist-locking thread 7 time 0 s 425 ms > rbtree thread 7 time 3 s 711 ms Hopefully I'm having a setup not too different from yours. Here is my benchmark of the RCU Judy Array insert operations. Please note that I have not focused on optimizing the update-side, I've made design choices aimed towards making lookups as fast as possible, sometimes at the expense of update overhead. I used a keyspace of 400M, just so the number of inserts that end up being a simple rcu lookup are non significant. I'm still using a 32-bit key length. I'm adding ranges of length 1. Here is the average time taken per insertion (on a per-core basis): - single-thread: 15000 ns/insert ( 645474 inserts in 10s) - 2 threads: 17000 ns/insert (~1188232 inserts in 10s on 2 cores) - 8 threads: 25000 ns/insert (~3161024 inserts in 10s on 8 cores) We get an efficiency of 0.66 when going from 2 to 8 threads (efficiency of 1 being linear scalability). > > My test machine is 8 cores, so at 16 threads we're into HT: > > skiplist-rcu thread 15 time 0 s 583 ms > skiplist-locking thread 15 time 0 s 559 ms > rbtree thread 15 time 7 s 423 ms Skipping this test, since my test machine does not have hyperthreading. > > It's not all sunshine though. If I shrink the keyspace down to 1000 > keys or less, there is more contention on the node locks and we're tied > (or slightly worse) than rbtree. In that kind of workload it makes > sense to add a big fat lock around the skiplist, or just stick with > rbtrees. I'd expect a similar behavior with RCU judy arrays with respect to small keyspace (single lock around updates might perform better that locks distributed within the data). However, even with a small keyspace (and few nodes), I expect that RCU Judy Array and skiplist-rcu will win over non-RCU rbtree for lookup speed when we have concurrent updates and lookups from many cores. Feedback is welcome, Thanks! Mathieu > > The skiplists do have locking and rcu variants of lookup operations. > The locking ones are built on top of the rcu ones, and you can use them > both at the same time. > > Patches follow. > > -chris > -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html