Re: [PATCH 2/2] mm: add PageWaiters bit to indicate waitqueue should be checked

Nicholas Piggin <npiggin@xxxxxxxxx> · Fri, 4 Nov 2016 13:40:49 +1100

On Thu, 3 Nov 2016 08:49:14 -0700
Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Wed, Nov 2, 2016 at 8:46 PM, Nicholas Piggin <npiggin@xxxxxxxxx> wrote:
> >
> > If you don't have that, then a long-waiting waiter for some
> > unrelated page can prevent other pages from getting back to
> > the fastpath.
> >
> > Contention bit is already explicitly not precise with this patch
> > (false positive possible), but in general the next wakeup will
> > clean it up. Without page_match, that's not always possible.  
> 
> Do we care?
> 
> The point is, it's rare, and if there are no numbers to say that it's
> an issue, we shouldn't create the complication. Numbers talk,
> handwaving "this might be an issue" walks.

Well you could have hundreds of waiters on pages with highly threaded
IO (say, a file server), which will cause collisions in the hash table.
I can just try to force that to happen and show up that 2.2% again.

Actaully it would be more than 2.2% with my patch as is, because it no
longer does an unlocked waitqueue_active() check if the waiters bit was
set (because with my approach the lock will always be required if only
to clear the bit after checking the waitqueue). If we avoid clearing
dangling bity there, we'll then have to reintroduce that test.

> That said, at least it isn't a big complexity that will hurt, and it's
> very localized.

I thought so :)

> 
> >> Also, it would be lovely to get numbers against the plain 4.8
> >> situation with the per-zone waitqueues. Maybe that used to help your
> >> workload, so the 2.2% improvement might be partly due to me breaking
> >> performance on your machine.  
> >
> > Oh yeah that'll hurt a bit. The hash will get spread over non-local
> > nodes now. I think it was only a 2 socket system, but remote memory
> > still takes a latency hit. Hmm, I think keeping the zone waitqueue
> > just for pages would be reasonable, because they're a special case?  
> 
> HELL NO!
> 
> Christ. That zone crap may have helped some very few NUMA machines,
> but it *hurt* normal machines.

Oh I missed why they hurt small systems -- where did you see that
slowdown? I agree that's a serious concern. I'll go back and read the
thread again.

> So no way in hell are we re-introducing that ugly, complex, fragile
> crap that actually slows down the normal case on real loads (not
> microbenchmarks). It was a mistake from the very beginning.

For the generic bit wait stuff, sure. For page waiters you always
have the page, there's no translation so I don't see the fragility.

> No, the reason I'd like to hear about numbers is that while I *know*
> that removing the crazy zone code helped on normal machines (since I
> could test that case myself), I still am interested in whether the
> zone removal hurt on some machines (probably not two-node ones,
> though: Mel already tested that on x86), I'd like to know what the
> situation is with the contention bit.
> 
> I'm pretty sure that with the contention bit, the zone crud is
> entirely immaterial (since we no longer actually hit the waitqueue
> outside of IO), but my "I'm pretty sure" comes back to the "handwaving
> walks" issue.

I do worry about pushing large amounts of IO, not even on huge NUMA
machines, but 2-4 socket. Then again, it *tends* to be that you don't
wait on every single page, but rather batches of them at a time.

> 
> So numbers would be really good.

I'll try to come up with some.

Thanks,
Nick

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>