On Mon, Sep 26, 2016 at 01:58:00PM -0700, Linus Torvalds wrote: > Why is the page_waitqueue() handling so expensive? Let me count the ways: > (b) It's cache miss heaven. It takes a cache miss on three different > things:looking up the zone 'wait_table', then looking up the hash > queue there, and finally (inside __wake_up_bit) looking up the wait > queue itself (which will effectively always be NULL). > Is there really any reason for that incredible indirection? Do we > really want to make the page_waitqueue() be a per-zone thing at all? > Especially since all those wait-queues won't even be *used* unless > there is actual IO going on and people are really getting into > contention on the page lock.. Why isn't the page_waitqueue() just one > statically sized array? I suspect the reason is to have per node hash tables, just like we get per node page-frame arrays with sparsemem. > Also, if those bitlock ops had a different bit that showed contention, > we could actually skip *all* of this, and just see that "oh, nobody is > waiting on this page anyway, so there's no point in looking up those > wait queues". We don't have that many "__wait_on_bit()" users, maybe > we could say that the bitlocks do have to haev *two* bits: one for the > lock bit itself, and one for "there is contention". That would be fairly simple to implement, the difficulty would be actually getting a page-flag to use for this. We're running pretty low in available bits :/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>