On Tue, Aug 21, 2018 at 01:51:59PM -0700, Andrew Morton wrote: > On Tue, 21 Aug 2018 14:30:24 +0200 Oscar Salvador <osalvador@xxxxxxxxxxxxxxxxxx> wrote: > > > On Tue, Aug 21, 2018 at 02:17:34PM +0200, Michal Hocko wrote: > > > We do have CONFIG_NODES_SHIFT=10 in our SLES kernels for quite some > > > time (around SLE11-SP3 AFAICS). > > > > > > Anyway, isn't NODES_ALLOC over engineered a bit? Does actually even do > > > larger than 1024 NUMA nodes? This would be 128B and from a quick glance > > > it seems that none of those functions are called in deep stacks. I > > > haven't gone through all of them but a patch which checks them all and > > > removes NODES_ALLOC would be quite nice IMHO. > > > > No, maximum we can get is 1024 NUMA nodes. > > I checked this when writing another patch [1], and since having gone > > through all archs Kconfigs, CONFIG_NODES_SHIFT=10 is the limit. > > > > NODEMASK_ALLOC gets only called from: > > > > - unregister_mem_sect_under_nodes() (not anymore after [1]) > > - __nr_hugepages_store_common (This does not seem to have a deep stack, we could use a normal nodemask_t) > > > > But is also used for NODEMASK_SCRATCH (mainly used for mempolicy): > > > > struct nodemask_scratch { > > nodemask_t mask1; > > nodemask_t mask2; > > }; > > > > that would make 256 bytes in case CONFIG_NODES_SHIFT=10. > > And that sole site could use an open-coded kmalloc. It is not really one single place, but four: - do_set_mempolicy() - do_mbind() - kernel_migrate_pages() - mpol_shared_policy_init() They get called in: - do_set_mempolicy() - From set_mempolicy syscall - From numa_policy_init() - From numa_default_policy() * All above do not look like they have a deep stack, so it should be possible to get rid of NODEMASK_SCRATCH there. - do_mbind - From mbind syscall * Should be feasible here as well. - kernel_migrate_pages() - From migrate_pages syscall * Again, this should be doable. - mpol_shared_policy_init() - From hugetlbfs_alloc_inode() - shmem_get_inode() * Seems doable for hugetlbfs_alloc_inode as well. I only got to check hugetlbfs_alloc_inode, because shmem_get_inode So it seems that this can be done in most of the places. The only tricky function might be mpol_shared_policy_init because of shmem_get_inode. But in that case, we could use an open-coded kmalloc there. Thanks -- Oscar Salvador SUSE L3