On Fri, Jul 18, 2014 at 11:19 AM, Tejun Heo <tj@xxxxxxxxxx> wrote:
>
> Hello,
>
> On Fri, Jul 18, 2014 at 11:12:01AM -0700, Nish Aravamudan wrote:
> > why aren't these callers using kthread_create_on_cpu()? That API was
>
> It is using that. There just are other data structures too.
Sorry, I might not have been clear.>
> Hello,
>
> On Fri, Jul 18, 2014 at 11:12:01AM -0700, Nish Aravamudan wrote:
> > why aren't these callers using kthread_create_on_cpu()? That API was
>
> It is using that. There just are other data structures too.
Why are any callers of the format kthread_create_on_node(..., cpu_to_node(cpu), ...) not using kthread_create_on_cpu(..., cpu, ...)?
> > already change to use cpu_to_mem() [so one change, rather than of all over
> > the kernel source]. We could change it back to cpu_to_node and push down
> > the knowledge about the fallback.
>
> And once it's properly solved, please convert back kthread to use
> cpu_to_node() too. We really shouldn't be sprinkling the new subtly
> different variant across the kernel. It's wrong and confusing.
I understand what you mean, but it's equally wrong for the kernel to be wasting GBs of slab. Different kinds of wrongness :)
> > Yes, this is a good point. But honestly, we're not really even to the point
> > of talking about fallback here, at least in my testing, going off-node at
> > all causes SLUB-configured slabs to deactivate, which then leads to an
> > explosion in the unreclaimable slab.
>
> I don't think moving the logic inside allocator proper is a huge
> amount of work and this isn't the first spillage of this subtlety out
> of allocator proper. Fortunately, it hasn't spread too much yet.
> Let's please stop it here. I'm not saying you shouldn't or can't fix
> the off-node allocation.
It seems like an additional reasonable approach would be to provide a suitable _cpu() API for the allocators. I'm not sure why saying that callers should know about NUMA (in order to call cpu_to_node() in every caller) is any better than saying that callers should know about memoryless nodes (in order to call cpu_to_mem() in every caller instead) -- when at least in several cases that I've seen the relevant data is what CPU we're expecting to run or are running on. Seems like the _cpu API would specify -- please allocate memory local to this CPU, wherever it is?
Thanks,
Nish