* Dave Chinner (david@xxxxxxxxxxxxx) wrote: > On Wed, Jun 26, 2013 at 10:29:36PM -0400, Mathieu Desnoyers wrote: > > * Chris Mason (clmason@xxxxxxxxxxxx) wrote: > > > Quoting Mathieu Desnoyers (2013-06-26 19:02:18) > > > > FWIW, my benchmark of RCU Judy array with ranges in similar conditions: > > > > > > > > - Using 32-bit key length > > > > - I first populated 10M ranges of len = 1, sequentially. > > > > - Then, I run a reader threads for 10s, which perform random lookups > > > > (all successful) in keys from 0 to 10M. > > > > > > Similar, I had 64 bit keys and the lookups were totally random (not all > > > successful). I doubt it matters too much for these numbers. > > > > I'd have to try with 64-bit keys, since it matters for RCU Judy. It > > means a successful lookup will need to read twice as many cache lines as > > for 32-bit keys. For my range implementation (on top of Judy), every > > lookup ends up succeeding, because it either finds an "empty" range or a > > populated range, so having match or non-match does not matter much for > > range lookups. > > Yeah, I only care about performance with 64 bit keys, sparse > keyspace population and random extent lengths. Random lookup > performance is more important than insert and delete, though I do > have cases where bulk sequential insert and removal are important, > too. One thing I noticed about 64-bit keys in Judy though: let's say we only use part of the key space (e.g. lower 32-bits). Even with a 64-bit key lookup, we end up always touching the same cache-lines for the top level nodes, and therefore, the number of cache lines we need to bring from memory will be quite close to that of a Judy array with 32-bit keys. I'll have to confirm this with benchmarks though. > > > > Also, my benchmarks were not just inserting keys but keys pointing to > > > things. So a lookup walked the tree and found an object and then > > > returned the object. radix can just return a key/value without > > > dereferencing the value, but that wasn't the case in my runs. > > > > In the specific test I ran, I'm looking up the "range" object, which is > > the dereferenced "value" pointer in terms of Judy lookup. My Judy array > > implementation represents items as a linked list of structures matching > > a given key. This linked list is embedded within the structures, > > similarly to the linux/list.h API. Then, if the lookup succeeds, I take > > a mutex on the range, and check if it has been concurrently removed. > > Does that mean that each "extent" that is indexed has a list head > embedded in it? That blows the size of the index out when all I > might want to store in the tree is a 64 bit value for a block > mapping... My implementation currently has chaining of duplicates for genericity. I'm keeping it simple (no special-cases) on purpose until we find out if the Judy approach is interesting at all. We could quite easily create a variant of the RCU Judy array augmented with range support that has this as node: /* The 64-bit start of range value is implicit within the Judy Array */ struct { uint64_t end; /* end of range */ spinlock_t lock; /* lock updates on range */ struct rcu_head head; /* if call_rcu() is required for reclaim */ unsigned int flags:2; /* range is free, allocated, or removed */ }; Depending on what you are ready to let go in terms of scalability and RCU reclaim batching, the lock and rcu_head could be removed. This ends up being a trade-off between update scalability and memory footprint. So if you go all the way for low memory footprint with a single lock covering all updates, and use synchronize_rcu() to perform reclaim, you end up with: /* The 64-bit start of range value is implicit within the Judy Array */ struct { uint64_t end; /* end of range */ unsigned int flags:2; /* range is free, allocated, or removed */ }; We could probably encode the flags into unused low-order bits of the pointer to the range if needed. > > FWIW, when a bunch of scalability work was done on xfs_repair years > ago, judy arrays were benchmarked for storing extent lists that > tracked free/used space. We ended up using a btree, because while it > was slower than the original bitmap code, it was actually faster > than the highly optimised judy array library and at the scale we > needed there was no memory usage advantage to using a judy array, > either... > > So I'm really starting to wonder if it'd be simpler for me just to > resurrect the old RCU friendly btree code Peter Z wrote years ago > (http://programming.kicks-ass.net/kernel-patches/vma_lookup/) and > customise it for the couple of uses I have in XFS.... Balanced tree structures end up having contention near the root if you want to perform concurrent updates on them. The advantage of RCU skip lists and Judy arrays over trees is that no rebalancing is required, so updates can be performed concurrently when touching different areas of the key space. Thanks, Mathieu > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html