On Fri, May 03, 2013 at 06:45:25AM -0400, Chris Mason wrote: > Quoting Jan Kara (2013-05-03 05:19:24) > > Hi, > > > > On Thu 02-05-13 22:02:11, Chris Mason wrote: > > > I've been working on skiplists for indexing off/on for a little while > > > now. For btrfs, I really need a fast way to manage extents that doesn't > > > bog down in lock contention for reads or writes. Dave Chinner has > > > mentioned this as well, so I've cc'd him. > > But I guess you still need writer-writer exclusion, don't you? Or are you > > able to do some partial locking of the structure to allow parallel > > modifications? > > Hi Jan, > > Yes, insert/delete still lock, but they only take locks for the levels > required for the operation. So if we're inserting at level 0, we'll > walk all the way down to level 0 before taking any locks. > > If an insert happens at level 5, it'll start locking at level 5 and > build a list of nodes we'll have to update in order to do the insert. > Everything in the list is locked until the insert is done. > > There's room for optimization there, since chances are good the > insertion point will have some free room and I won't need all those > locks. The real goal is a little less fine grained, just because all > the locking does slow things down when there's no contention. > > For the iommu code, the biggest difference between skiplists and rbtrees > is the rbtrees are trying to remember a spot they are likely to find > free ranges to hand out. The skiplists pick a random starting point and > hope the locking spread saves us. > > > > > > Liu Bo started this skiplist code, but at the time it didn't make > > > a huge difference. I've reworked things a bit and used the IOMMU > > > to test things. I ran some benchmarks on a single socket > > > server with two SSD cards to get a baseline of how much it is helping. > > > > > > I tried to keep the basic structure of the IOMMU code, and there are probably > > > many better ways to fix the IOMMU contention. Really I'm just trying to > > > compare with rbtrees, not fix the IOMMU. > > There also exist some research papers on RCU friendly RB-trees (or other > > trees). Maybe they would be interesting to try? The basic trick is to use a > > copy-on-write when you need to do a rotation. This is slightly impractical > > because it requires memory allocation (or alternatively doubles memory > > footprint of an rb node) but you need to do a similar stuff in skip lists > > anyway. Just an idea... > > Definitely interested if you know of alternatives. Most of the ones I > saw were RCU for reads but not very fine grained for writes. For btrfs > at least, the indexes will be getting a lot of updates and I need higher > concurrency on writes. I've got two cases I care about. The first is the buffer cache indexes which have a 1000:1 read:modify ratio and I'd really like the lookups to be lockless. The other case is the extent tree, where we do lots of inserts when the extent tree is first read, and after than it's typically 2 lookups for every insert/remove. Having one tree that works for both would be handy... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html