Re: [PATCH 2/2] mm: add PageWaiters bit to indicate waitqueue should be checked

Nicholas Piggin <npiggin@xxxxxxxxx> · Fri, 4 Nov 2016 18:29:42 +1100

On Fri, 4 Nov 2016 13:40:49 +1100
Nicholas Piggin <npiggin@xxxxxxxxx> wrote:

> On Thu, 3 Nov 2016 08:49:14 -0700
> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> 
> > On Wed, Nov 2, 2016 at 8:46 PM, Nicholas Piggin <npiggin@xxxxxxxxx> wrote:  
> > >
> > > If you don't have that, then a long-waiting waiter for some
> > > unrelated page can prevent other pages from getting back to
> > > the fastpath.
> > >
> > > Contention bit is already explicitly not precise with this patch
> > > (false positive possible), but in general the next wakeup will
> > > clean it up. Without page_match, that's not always possible.    
> > 
> > Do we care?
> > 
> > The point is, it's rare, and if there are no numbers to say that it's
> > an issue, we shouldn't create the complication. Numbers talk,
> > handwaving "this might be an issue" walks.  
> 
> Well you could have hundreds of waiters on pages with highly threaded
> IO (say, a file server), which will cause collisions in the hash table.
> I can just try to force that to happen and show up that 2.2% again.
> 
> Actaully it would be more than 2.2% with my patch as is, because it no
> longer does an unlocked waitqueue_active() check if the waiters bit was
> set (because with my approach the lock will always be required if only
> to clear the bit after checking the waitqueue). If we avoid clearing
> dangling bity there, we'll then have to reintroduce that test.
> 
> > That said, at least it isn't a big complexity that will hurt, and it's
> > very localized.  
> 
> I thought so :)
> 
> >   
> > >> Also, it would be lovely to get numbers against the plain 4.8
> > >> situation with the per-zone waitqueues. Maybe that used to help your
> > >> workload, so the 2.2% improvement might be partly due to me breaking
> > >> performance on your machine.    
> > >
> > > Oh yeah that'll hurt a bit. The hash will get spread over non-local
> > > nodes now. I think it was only a 2 socket system, but remote memory
> > > still takes a latency hit. Hmm, I think keeping the zone waitqueue
> > > just for pages would be reasonable, because they're a special case?    
> > 
> > HELL NO!
> > 
> > Christ. That zone crap may have helped some very few NUMA machines,
> > but it *hurt* normal machines.  
> 
> Oh I missed why they hurt small systems -- where did you see that
> slowdown? I agree that's a serious concern. I'll go back and read the
> thread again.

Oh, okay, the zone lookup. Well I am of the impression that most of the
cache misses are coming from the waitqueue hash table itself. On a small
system (or big system doing local operations), the zone lookup I thought
should be quite well cached. The zone waitqueue hashes were like 96KB each
in size, so a random access is almost certainly an L1 miss and probably L2
miss as well.

Anyway I'm still going to try to get numbers for this, but I wonder if
you saw the zone causing a lot of misses, or if it was the waitqueue?

Thanks,
Nick

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>