Re: page_waitqueue() considered harmful

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Tue, 27 Sep 2016 09:30:55 +0200

On Mon, Sep 26, 2016 at 01:58:00PM -0700, Linus Torvalds wrote:

> Why is the page_waitqueue() handling so expensive? Let me count the ways:

>  (b) It's cache miss heaven. It takes a cache miss on three different
> things:looking up the zone 'wait_table', then looking up the hash
> queue there, and finally (inside __wake_up_bit) looking up the wait
> queue itself (which will effectively always be NULL).

> Is there really any reason for that incredible indirection? Do we
> really want to make the page_waitqueue() be a per-zone thing at all?
> Especially since all those wait-queues won't even be *used* unless
> there is actual IO going on and people are really getting into
> contention on the page lock.. Why isn't the page_waitqueue() just one
> statically sized array?

I suspect the reason is to have per node hash tables, just like we get
per node page-frame arrays with sparsemem.

> Also, if those bitlock ops had a different bit that showed contention,
> we could actually skip *all* of this, and just see that "oh, nobody is
> waiting on this page anyway, so there's no point in looking up those
> wait queues". We don't have that many "__wait_on_bit()" users, maybe
> we could say that the bitlocks do have to haev *two* bits: one for the
> lock bit itself, and one for "there is contention".

That would be fairly simple to implement, the difficulty would be
actually getting a page-flag to use for this. We're running pretty low
in available bits :/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>