On Wed, Oct 25, 2023 at 10:18:00AM +0200, Rasmus Villemoes wrote: > On 25/10/2023 09.18, kernel test robot wrote: > > > > > > Hello, > > > > kernel test robot noticed a 3.7% improvement of will-it-scale.per_thread_ops on: > > So with that, can we please just finally say "yeah, let's make the > generic bitmap library functions correct They are all correct already. > and usable in more cases" See below. > instead of worrying about random micro-benchmarks that just show > you-win-some-you-lose-some. That's I agree. I don't worry about either +2% or -3% benchmark, and don't think that they alone can or can't justificate such a radical change like making all find_bit functions volatile, and shutting down a newborn KCSAN. Keeping that in mind, my best guess is that Jan's and Misrad's test that shows +2% was against stable bitmaps; and what robot measured is most likely against heavily concurrent access to some bitmap in the kernel. I didn't look at both tests sources, but that at least makes some sense, because if GCC optimizes code against properly described memory correctly, this is exactly what we can expect. > Yes, users will have to treat results from the find routines carefully > if their bitmap may be concurrently modified. They do. Nobody wins if > those users are forced to implement their own bitmap routines for their > lockless algorithms. Again, I agree with this point, and I'm trying to address exactly this. I'm working on a series that introduces lockless find_bit functions based on existing FIND_BIT() engine. It's not ready yet, but I hope I'll submit it in the next merge window. https://github.com/norov/linux/commits/find_and_bit Now that we've got a test that presumably works faster if find_bit() functions are all switched to be volatile, it would be great if we get into details and understand: - what find_bit function or functions gives that gain in performance; - on what bitmap(s); - is the reason in concurrent memory access (guess yes), and if so, - can we refactor the code to use lockless find_and_bit() functions mentioned above; - if not, how else can we address this. If you or someone else have an extra time slot to get deeper into that, I'll be really thankful. Thanks, Yury