On Mon, Feb 29, 2016 at 4:33 AM, Vlastimil Babka <vbabka@xxxxxxx> wrote: > On 02/02/2016 06:42 AM, Andrew Morton wrote: >> >> On Wed, 27 Jan 2016 22:19:14 -0800 Dan Williams <dan.j.williams@xxxxxxxxx> >> wrote: >> >>> ZONE_DEVICE (merged in 4.3) and ZONE_CMA (proposed) are examples of new >>> mm zones that are bumping up against the current maximum limit of 4 >>> zones, i.e. 2 bits in page->flags. When adding a zone this equation >>> still needs to be satisified: >>> >>> SECTIONS_WIDTH + ZONES_WIDTH + NODES_SHIFT + LAST_CPUPID_SHIFT >>> <= BITS_PER_LONG - NR_PAGEFLAGS >>> >>> ZONE_DEVICE currently tries to satisfy this equation by requiring that >>> ZONE_DMA be disabled, but this is untenable given generic kernels want >>> to support ZONE_DEVICE and ZONE_DMA simultaneously. ZONE_CMA would like >>> to increase the amount of memory covered per section, but that limits >>> the minimum granularity at which consecutive memory ranges can be added >>> via devm_memremap_pages(). >>> >>> The trade-off of what is acceptable to sacrifice depends heavily on the >>> platform. For example, ZONE_CMA is targeted for 32-bit platforms where >>> page->flags is constrained, but those platforms likely do not care about >>> the minimum granularity of memory hotplug. A big iron machine with 1024 >>> numa nodes can likely sacrifice ZONE_DMA where a general purpose >>> distribution kernel can not. >>> >>> CONFIG_NR_ZONES_EXTENDED is a configuration symbol that gets selected >>> when the number of configured zones exceeds 4. It documents the >>> configuration symbols and definitions that get modified when ZONES_WIDTH >>> is greater than 2. >>> >>> For now, it steals a bit from NODES_SHIFT. Later on it can be used to >>> document the definitions that get modified when a 32-bit configuration >>> wants more zone bits. >> >> >> So if you want ZONE_DMA, you're limited to 512 NUMA nodes? >> >> That seems reasonable. > > > Sorry for the late reply, but it seems that with !SPARSEMEM, or with > SPARSEMEM_VMEMMAP, reducing NUMA nodes isn't even necessary, because > SECTIONS_WIDTH is zero (see the diagrams in linux/page-flags-layout.h). In > my brief tests with 4.4 based kernel with SPARSEMEM_VMEMMAP it seems that > with 1024 NUMA nodes and 8192 CPU's, there's still 7 bits left (i.e. 6 with > CONFIG_NR_ZONES_EXTENDED). > > With the danger of becoming even more complex, could the limit also depend > on CONFIG_SPARSEMEM/VMEMMAP to reflect that somehow? In this case it's already part of the equation because: config ZONE_DEVICE depends on MEMORY_HOTPLUG depends on MEMORY_HOTREMOVE ...and those in turn depend on SPARSEMEM. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>