On Fri, 4 Nov 2016 08:59:15 -0700 Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Fri, Nov 4, 2016 at 12:29 AM, Nicholas Piggin <npiggin@xxxxxxxxx> wrote: > > Oh, okay, the zone lookup. Well I am of the impression that most of the > > cache misses are coming from the waitqueue hash table itself. > > No. > > Nick, stop this idiocy. > > NUMBERS, Nick. NUMBERS. > > I posted numbers in "page_waitqueue() considered harmful" on linux-mm. No I understand that, and am in the process of getting numbers. I wasn't suggesting re-adding it based on "impression", I was musing over your idea that the zone lookup hurts small systems. I'm trying to find why that is and measure it! It's no good me finding a vast NUMA system to show some improvement on if it ends up hurting 1-2 socket systems, is it? But I can't see 3 cache misses there, and even the loads I can't see how they match your post. We have: page->flags pglist_data->node_zones[x].wait_table wait_table[x].task_list Page flags is in cache. wait_table is a dependent load but I'd have thought it would cache relatively well. About as well as bit_wait_table pointer load, but even if you count that as a miss, it's 2 cache misses. Also keep in mind this PG_waiters patch actually reintroduces the load-after-store stall on x86 because the PG_waiters bit is tested after the unlock. On my skylake it doesn't seem to matter about the operand size mismatch because it isn't forwarding the atomic op to the load anyway (which makes sense, because atomic ops cause a store queue drain). So if we have this patch, there is no additional stall on the page_zone load there. > And quite frankly, before _you_ start posting numbers, that zone crap > IS NEVER COMING BACK. > > What's so hard about this concept? We don't add crazy complexity > without numbers. Numbers that I bet you will not be able to provide, > because quiet frankly, even in your handwavy "what about lots of > concurrent IO from hundreds of threads" situation, that wait-queue > will NOT BE NOTICEABLE. That particular handwaving was *not* in the context of the zone waitqueues, it was in context of PG_waiters bit slowpath with waitqueue hash collisions. Different issue, and per-zone waitqueues don't do anything to solve it. > > So no "impressions". No "what abouts". No "threaded IO" excuses. The > _only_ thing that matters is numbers. If you don't have them, don't > bother talking about that zone patch. I agree with you, and am trying to reproduce your numbers at the moment. Thanks, Nick -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>