On Tue, Apr 16, 2019 at 10:22:40PM +1000, Dave Chinner wrote: > On Thu, Apr 11, 2019 at 11:11:17AM +1000, Dave Chinner wrote: > > On Mon, Apr 08, 2019 at 09:37:09AM -0700, Davidlohr Bueso wrote: > > > On Mon, 2019-04-08 at 12:33 +0200, Jan Kara wrote: > > > > On Fri 05-04-19 08:17:30, Dave Chinner wrote: > > > > > FYI, I'm working on a range lock implementation that should both > > > > > solve the performance issue and the reader starvation issue at the > > > > > same time by allowing concurrent buffered reads and writes to > > > > > different file ranges. > > > > > > > > Are you aware of range locks Davidlohr has implemented [1]? It didn't get > > > > merged because he had no in-tree user at the time (he was more aiming at > > > > converting mmap_sem which is rather difficult). But the generic lock > > > > implementation should be well usable. > > > > > > > > Added Davidlohr to CC. ..... > Fio randrw numbers on a single file on a pmem device on a 16p > machine using 4kB AIO-DIO iodepth 128 w/ fio on 5.1.0-rc3: > > IOPS read/write (direct IO) > fio processes rwsem rangelock > 1 78k / 78k 75k / 75k > 2 131k / 131k 123k / 123k > 4 267k / 267k 183k / 183k > 8 372k / 372k 177k / 177k > 16 315k / 315k 135k / 135k .... > FWIW, I'm not convinced about the scalability of the rb/interval > tree, to tell you the truth. We got rid of the rbtree in XFS for > cache indexing because the multi-level pointer chasing was just too > expensive to do under a spinlock - it's just not a cache efficient > structure for random index object storage. Yeah, definitely not convinced an rbtree is the right structure here. Locking of the tree is the limitation.... > FWIW, I have basic hack to replace the i_rwsem in XFS with a full > range read or write lock with my XFS range lock implementation so it > just behaves like a rwsem at this point. It is not in any way > optimised at this point. Numbers for same AIO-DIO test are: Now the stuff I've been working on has the same interface as Davidlohr's patch, so I can swap and change them without thinking about it. It's still completely unoptimised, but: IOPS read/write (direct IO) processes rwsem DB rangelock XFS rangelock 1 78k / 78k 75k / 75k 72k / 72k 2 131k / 131k 123k / 123k 133k / 133k 4 267k / 267k 183k / 183k 237k / 237k 8 372k / 372k 177k / 177k 265k / 265k 16 315k / 315k 135k / 135k 228k / 228k It's still substantially faster than the interval tree code. BTW, if I take away the rwsem serialisation altogether, this test tops out at just under 500k/500k at 8 threads, and at 16 threads has started dropping off (~440k/440k). So the rwsem is a scalability limitation at just 8 threads.... /me goes off and thinks more about adding optimistic lock coupling to the XFS iext btree to get rid of the need for tree-wide locking altogether Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx