Re: [RFC 0/2] Memoryless nodes and kworker

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tejun,

On Fri, Jul 18, 2014 at 4:20 AM, Tejun Heo <tj@xxxxxxxxxx> wrote:
>
> On Thu, Jul 17, 2014 at 04:09:23PM -0700, Nishanth Aravamudan wrote:
> > [Apologies for the large Cc list, but I believe we have the following
> > interested parties:
> >
> > x86 (recently posted memoryless node support)
> > ia64 (existing memoryless node support)
> > ppc (existing memoryless node support)
> > previous discussion of how to solve Anton's issue with slab usage
> > workqueue contributors/maintainers]
>
> Well, you forgot to cc me.

Ah I'm very sorry! That's what I get for editing e-mails... Thank you for your reply!

> ...
> > It turns out we see this large slab usage due to using the wrong NUMA
> > information when creating kthreads.
> >
> > Two changes are required, one of which is in the workqueue code and one
> > of which is in the powerpc initialization. Note that ia64 may want to
> > consider something similar.
>
> Wasn't there a thread on this exact subject a few weeks ago?  Was that
> someone else?  Memory-less node detail leaking out of allocator proper
> isn't a good idea.  Please allow allocator users to specify the nodes
> they're on and let the allocator layer deal with mapping that to
> whatever is appropriate.  Please don't push that to everybody.

I didn't send anything for the workqueue logic anytime recently. Jiang sent out a patchset for x86 memoryless node support, which may have touched kernel/workqueue.c.

So, to be clear, this is not *necessarily* about memoryless nodes. It's about the semantics intended. The workqueue code currently calls cpu_to_node() in a few places, and passes that node into the core MM as a hint about where the memory should come from. However, when memoryless nodes are present, that hint is guaranteed to be wrong, as it's the nearest NUMA node to the CPU (which happens to be the one its on), not the nearest NUMA node with memory. The hint is correctly specified as cpu_to_mem(), which does the right thing in the presence or absence of memoryless nodes. And I think encapsulates the hint's semantics correctly -- please give me memory from where I expect it, which is the closest NUMA node.

I guess we could also change tsk_fork_get_node to return local_memory_node(tsk->pref_node_fork), but that can be a bit expensive, as it generates a new zonelist each time to determine the first fallback node. We get the exact same semantics (because cpu_to_mem() caches the result of local_memory_node) by using cpu_to_mem directly.

Again, apologies for not Cc'ing you originally.

-Nish

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]