Re: [RFC][PATCH] make global bitlock waitqueues per-node

Nicholas Piggin <npiggin@xxxxxxxxx> · Tue, 20 Dec 2016 23:21:22 +1000

On Tue, 20 Dec 2016 12:58:25 +0000
Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:

> On Tue, Dec 20, 2016 at 12:31:13PM +1000, Nicholas Piggin wrote:
> > On Mon, 19 Dec 2016 16:20:05 -0800
> > Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> wrote:
> >   
> > > On 12/19/2016 03:07 PM, Linus Torvalds wrote:  
> > > >     +wait_queue_head_t *bit_waitqueue(void *word, int bit)
> > > >     +{
> > > >     +       const int __maybe_unused nid = page_to_nid(virt_to_page(word));
> > > >     +
> > > >     +       return __bit_waitqueue(word, bit, nid);
> > > > 
> > > > No can do. Part of the problem with the old coffee was that it did that
> > > > virt_to_page() crud. That doesn't work with the virtually mapped stack.     
> > > 
> > > Ahhh, got it.
> > > 
> > > So, what did you have in mind?  Just redirect bit_waitqueue() to the
> > > "first_online_node" waitqueues?
> > > 
> > > wait_queue_head_t *bit_waitqueue(void *word, int bit)
> > > {
> > >         return __bit_waitqueue(word, bit, first_online_node);
> > > }
> > > 
> > > We could do some fancy stuff like only do virt_to_page() for things in
> > > the linear map, but I'm not sure we'll see much of a gain for it.  None
> > > of the other waitqueue users look as pathological as the 'struct page'
> > > ones.  Maybe:
> > > 
> > > wait_queue_head_t *bit_waitqueue(void *word, int bit)
> > > {
> > > 	int nid
> > > 	if (word >= VMALLOC_START) /* all addrs not in linear map */
> > > 		nid = first_online_node;
> > > 	else
> > > 		nid = page_to_nid(virt_to_page(word));
> > >         return __bit_waitqueue(word, bit, nid);
> > > }  
> > 
> > I think he meant just make the page_waitqueue do the per-node thing
> > and leave bit_waitqueue as the global bit.
> >   
> 
> I'm pressed for time but at a glance, that might require a separate
> structure of wait_queues for page waitqueue. Most users of bit_waitqueue
> are not operating with pages. The first user is based on a word inside
> a block_device for example. All non-page users could assume node-0.

Yes it would require something or other like that. Trivial to keep things
balanced (if not local) over nodes by take a simple hash of the virtual
address to spread over the nodes. Or just keep using this separate global
table for the bit_waitqueue...

But before Linus grumps at me again, let's try to do the waitqueue
avoidance bit first before we worry about that :)

> It
> shrinks the available hash table space but as before, maybe collisions
> are not common enough to actually matter. That would be worth checking
> out. Alternatively, careful auditing to pick a node when it's known it's
> safe to call virt_to_page may work but it would be fragile.
> 
> Unfortunately I won't be able to review or test any patches until January
> 3rd after I'm back online properly. Right now, I have intermittent internet
> access at best. During the next 4 days, I know I definitely will not have
> any internet access.
> 
> The last time around, there were three patch sets to avoid the overhead for
> pages in particular. One was dropped (mine, based on Nick's old work) as
> it was too complicated. Peter had some patches but after enough hammering
> it failed due to a missed wakup that I didn't pin down before having to
> travel to a conference.
> 
> I hadn't tested Nick's prototype although it looked fine because others
> reviewed it before I looked and I was waiting for another version to
> appear. If one appears, I'll take a closer look and bash it across a few
> machines to see if it has any lost wakeup problems.
> 

Sure I'll respin it this week.

Thanks,
Nick

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>