On Tue, Dec 20, 2016 at 12:31:13PM +1000, Nicholas Piggin wrote: > On Mon, 19 Dec 2016 16:20:05 -0800 > Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> wrote: > > > On 12/19/2016 03:07 PM, Linus Torvalds wrote: > > > +wait_queue_head_t *bit_waitqueue(void *word, int bit) > > > +{ > > > + const int __maybe_unused nid = page_to_nid(virt_to_page(word)); > > > + > > > + return __bit_waitqueue(word, bit, nid); > > > > > > No can do. Part of the problem with the old coffee was that it did that > > > virt_to_page() crud. That doesn't work with the virtually mapped stack. > > > > Ahhh, got it. > > > > So, what did you have in mind? Just redirect bit_waitqueue() to the > > "first_online_node" waitqueues? > > > > wait_queue_head_t *bit_waitqueue(void *word, int bit) > > { > > return __bit_waitqueue(word, bit, first_online_node); > > } > > > > We could do some fancy stuff like only do virt_to_page() for things in > > the linear map, but I'm not sure we'll see much of a gain for it. None > > of the other waitqueue users look as pathological as the 'struct page' > > ones. Maybe: > > > > wait_queue_head_t *bit_waitqueue(void *word, int bit) > > { > > int nid > > if (word >= VMALLOC_START) /* all addrs not in linear map */ > > nid = first_online_node; > > else > > nid = page_to_nid(virt_to_page(word)); > > return __bit_waitqueue(word, bit, nid); > > } > > I think he meant just make the page_waitqueue do the per-node thing > and leave bit_waitqueue as the global bit. > I'm pressed for time but at a glance, that might require a separate structure of wait_queues for page waitqueue. Most users of bit_waitqueue are not operating with pages. The first user is based on a word inside a block_device for example. All non-page users could assume node-0. It shrinks the available hash table space but as before, maybe collisions are not common enough to actually matter. That would be worth checking out. Alternatively, careful auditing to pick a node when it's known it's safe to call virt_to_page may work but it would be fragile. Unfortunately I won't be able to review or test any patches until January 3rd after I'm back online properly. Right now, I have intermittent internet access at best. During the next 4 days, I know I definitely will not have any internet access. The last time around, there were three patch sets to avoid the overhead for pages in particular. One was dropped (mine, based on Nick's old work) as it was too complicated. Peter had some patches but after enough hammering it failed due to a missed wakup that I didn't pin down before having to travel to a conference. I hadn't tested Nick's prototype although it looked fine because others reviewed it before I looked and I was waiting for another version to appear. If one appears, I'll take a closer look and bash it across a few machines to see if it has any lost wakeup problems. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>