On 10/05/2022 15:34, Jens Axboe wrote:
Yes, there is the extra load. I would hope that there would be a low
cost, but I agree that we still want to avoid it. So prob no point in
testing this more.
I don't think that's low cost at all. It's the very hot path, and you're
now not only doing an extra load, it's a dependent load - you need to
load both to make any progress. On top of that, it's not like it's two
loads from the same cacheline or even page. The most important thing for
performance these days is having good cache utilization, the patch as it
stands very much makes that a lot worse.
Understood. Essentially patch #1/2 points in the wrong direction.
I have to admit that I was a bit blinkered by seeing how much I could
improve the NUMA case.
Besides, for any kind of performance work like that, it's customary to
showcase both the situation that is supposedly fixed or improved with
the change, but also to test that it didn't regress the existing
common/fast case.
Right, I should have done that.
It doesn't seem like a good
approach for the issue, as it pessimizes the normal fast case.
Spreading the memory out does probably make sense, but we need to retain
the fast normal case. Making sbitmap support both, selected at init
time, would be far more likely to be acceptable imho.
I wanted to keep the code changes minimal for an initial RFC to test
the water.
My original approach did not introduce the extra load for normal path
and had some init time selection for a normal word map vs numa word
map, but the code grew and became somewhat unmanageable. I'll revisit
it to see how to improve that.
Probably just needs some clean refactoring first, so that the actual
change can be pretty small.
I think that it may be just a case of separating out the handling of
evaluating the sbitmap_word ptr as that is that common struct which we
deal with.
Thanks,
John