On Wed, Mar 17, 2021 at 3:44 AM Nicholas Piggin <npiggin@xxxxxxxxx> wrote: > > Argh, because I didn't test small. Sorry I had the BASE_SMALL setting in > another patch and thought it would be a good idea to mash them together. > In hindsight probably not even if it did build. I was going to complain about that code in general. First complaining about the hash being small, and then adding a config option to make it ridiculously much *smaller* seemed wrong to begin with, and didn't make any sense. So no, please don't smash together. In fact, I'd like to see this split up, and with more numbers: - separate out the bit_waitqueue thing that is almost certainly not remotely as critical (and maybe not needed at all) - show the profile number _after_ the patch(es) - explain why you picked the random scaling numbers (21 and 22 for the two different cases)? - give an estimate of how big the array now ends up being for different configurations. I think it ends up using that "scale" factor of 21, and basically being "memory size >> 21" and then rounding up to a power of two. And honestly, I'm not sure that makes much sense. So for a 1GB machine we get the same as we used to for the bit waitqueue (twice as many for the page waitqueue) , but if you run on some smaller setup, you apparently can end up with just a couple of buckets. So I'd feel a lot better about this if I saw the numbers, and got the feeling that the patch actually tries to take legacy machines into account. And even on a big machine, what's the advantage of scaling perfectly with memory. If you have a terabyte of RAM, why would you need half a million hash entries (if I did the math right), and use 4GB of memory on it? The contention doesn't go up by amount of memory, it goes up roughly by number of threads, and the two are very seldom really all that linearly connected. So honestly, I'd like to see more reasonable numbers. I'd like to see what the impact of just raising the hash bit size from 8 to 16 is on that big machine. Maybe still using alloc_large_system_hash(), but using a low-imit of 8 (our traditional very old number that hasn't been a problem even on small machines), and a high-limit of 16 or something. And if you want even more, I really really want that justified by the performance / profile numbers. And does does that "bit_waitqueue" really merit updating AT ALL? It's almost entirely unused these days. I think maybe the page lock code used to use that, but then realized it had more specialized needs, so now it's separate. So can we split that bit-waitqueue thing up from the page waitqueue changes? They have basically nothing in common except for a history, and I think they should be treated separately (including the explanation for what actually hits the bottleneck). Linus