* Chris Mason (clmason@xxxxxxxxxxxx) wrote: [...] > Ouch, ok. In private email yesterday I talked with Mathieu about how > his current setup can't prevent the concurrent insertion of overlapping > extents. He does have a plan to address this where the insertion is > synchronized by keeping placeholders in the tree for the free space. I > think it'll work, but I'm worried about doubling the cost of the insert. Hi Chris, The weekend and early week has been productive on my side. My updated work is available on this new branch: git://git.lttng.org/userspace-rcu.git branch: urcu/rcuja-range Since last week, I managed to: - expand the RCU Judy Array API documentation: https://git.lttng.org/?p=userspace-rcu.git;a=blob;f=urcu/rcuja.h;h=82e272bd4ede1aec436845aef287754dd1dab8b6;hb=03a50ae89ec4d7f39e91d0d49c4639c4cf6e894c - create an API for Judy Array Ranges, as discussed via email privately: API: https://git.lttng.org/?p=userspace-rcu.git;a=blob;f=urcu/rcuja-range.h;h=63035a1660888aa5f9b20548046571dcb54ad193;hb=03a50ae89ec4d7f39e91d0d49c4639c4cf6e894c Implementation: https://git.lttng.org/?p=userspace-rcu.git;a=blob;f=rcuja/rcuja-range.c;h=7e4585ef942d76f1811f3c958fff3138ac120ca3;hb=03a50ae89ec4d7f39e91d0d49c4639c4cf6e894c Please keep in mind that this code has only been moderately stress-tested (with up to 24 cores, on small keyspaces of 3, 5, 10, 100 keys, so races occur much more frequently). It should not be considered production-ready yet. The test code (and thus examples usage) is available here: https://git.lttng.org/?p=userspace-rcu.git;a=blob;f=tests/test_urcu_ja_range.c;h=12abcc51465b64a7124fb3e48a2150e225e145af;hb=03a50ae89ec4d7f39e91d0d49c4639c4cf6e894c https://git.lttng.org/?p=userspace-rcu.git;a=blob;f=tests/test_urcu_ja_range.h;h=e9bbdbc3ed7eb8f57e30c26b8789ba609a6bfdd9;hb=03a50ae89ec4d7f39e91d0d49c4639c4cf6e894c So far, my benchmarks shows near-linear read-side scalability (as expected from RCU). However, early results does not show the scalability I would have expected for concurrent updates. It's not as bad as, e.g., a global lock making performances crawl due to ping-pong between processors, but so far, roughly speaking, if I multiply the number of cores doing updates by e.g. 12, the per-core throughput of update stress-test gets divided by approximately 12. Therefore, the number of updates system-wide seems to stay constant as we increase the number of cores. I will try to get more info as I dig into more benchmarking, which may point at some memory-throughput bottlenecks. I stopped working on the range implementation thinking that I should wait to get some feedback before I start implementing more complex features like RCU-friendly range resize. Feedback is welcome! Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html