On Mon 20-08-18 14:24:40, Andrew Morton wrote: > On Mon, 20 Aug 2018 10:55:16 +0200 Oscar Salvador <osalvador@xxxxxxxxxxxxxxxxxx> wrote: > > > From: Oscar Salvador <osalvador@xxxxxxx> > > > > Currently, NODEMASK_ALLOC allocates a nodemask_t with kmalloc when > > NODES_SHIFT is higher than 8, otherwise it declares it within the stack. > > > > The comment says that the reasoning behind this, is that nodemask_t will be > > 256 bytes when NODES_SHIFT is higher than 8, but this is not true. > > For example, NODES_SHIFT = 9 will give us a 64 bytes nodemask_t. > > Let us fix up the comment for that. > > > > Another thing is that it might make sense to let values lower than 128bytes > > be allocated in the stack. > > Although this all depends on the depth of the stack > > (and this changes from function to function), I think that 64 bytes > > is something we can easily afford. > > So we could even bump the limit by 1 (from > 8 to > 9). > > > > I agree. Such a change will reduce the amount of testing which the > kmalloc version receives, but I assume there are enough people out > there testing with large NODES_SHIFT values. We do have CONFIG_NODES_SHIFT=10 in our SLES kernels for quite some time (around SLE11-SP3 AFAICS). Anyway, isn't NODES_ALLOC over engineered a bit? Does actually even do larger than 1024 NUMA nodes? This would be 128B and from a quick glance it seems that none of those functions are called in deep stacks. I haven't gone through all of them but a patch which checks them all and removes NODES_ALLOC would be quite nice IMHO. -- Michal Hocko SUSE Labs