On Tue, 20 Dec 2016 12:58:25 +0000 Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > On Tue, Dec 20, 2016 at 12:31:13PM +1000, Nicholas Piggin wrote: > > On Mon, 19 Dec 2016 16:20:05 -0800 > > Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> wrote: > > > > > On 12/19/2016 03:07 PM, Linus Torvalds wrote: > > > > +wait_queue_head_t *bit_waitqueue(void *word, int bit) > > > > +{ > > > > + const int __maybe_unused nid = page_to_nid(virt_to_page(word)); > > > > + > > > > + return __bit_waitqueue(word, bit, nid); > > > > > > > > No can do. Part of the problem with the old coffee was that it did that > > > > virt_to_page() crud. That doesn't work with the virtually mapped stack. > > > > > > Ahhh, got it. > > > > > > So, what did you have in mind? Just redirect bit_waitqueue() to the > > > "first_online_node" waitqueues? > > > > > > wait_queue_head_t *bit_waitqueue(void *word, int bit) > > > { > > > return __bit_waitqueue(word, bit, first_online_node); > > > } > > > > > > We could do some fancy stuff like only do virt_to_page() for things in > > > the linear map, but I'm not sure we'll see much of a gain for it. None > > > of the other waitqueue users look as pathological as the 'struct page' > > > ones. Maybe: > > > > > > wait_queue_head_t *bit_waitqueue(void *word, int bit) > > > { > > > int nid > > > if (word >= VMALLOC_START) /* all addrs not in linear map */ > > > nid = first_online_node; > > > else > > > nid = page_to_nid(virt_to_page(word)); > > > return __bit_waitqueue(word, bit, nid); > > > } > > > > I think he meant just make the page_waitqueue do the per-node thing > > and leave bit_waitqueue as the global bit. > > > > I'm pressed for time but at a glance, that might require a separate > structure of wait_queues for page waitqueue. Most users of bit_waitqueue > are not operating with pages. The first user is based on a word inside > a block_device for example. All non-page users could assume node-0. Yes it would require something or other like that. Trivial to keep things balanced (if not local) over nodes by take a simple hash of the virtual address to spread over the nodes. Or just keep using this separate global table for the bit_waitqueue... But before Linus grumps at me again, let's try to do the waitqueue avoidance bit first before we worry about that :) > It > shrinks the available hash table space but as before, maybe collisions > are not common enough to actually matter. That would be worth checking > out. Alternatively, careful auditing to pick a node when it's known it's > safe to call virt_to_page may work but it would be fragile. > > Unfortunately I won't be able to review or test any patches until January > 3rd after I'm back online properly. Right now, I have intermittent internet > access at best. During the next 4 days, I know I definitely will not have > any internet access. > > The last time around, there were three patch sets to avoid the overhead for > pages in particular. One was dropped (mine, based on Nick's old work) as > it was too complicated. Peter had some patches but after enough hammering > it failed due to a missed wakup that I didn't pin down before having to > travel to a conference. > > I hadn't tested Nick's prototype although it looked fine because others > reviewed it before I looked and I was waiting for another version to > appear. If one appears, I'll take a closer look and bash it across a few > machines to see if it has any lost wakeup problems. > Sure I'll respin it this week. Thanks, Nick -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>