Any comment on this or are the issues just going to be waved away? On Mon, Jan 20, 2014 at 03:14:09PM +0000, Mel Gorman wrote: > On Mon, Jan 20, 2014 at 03:29:41PM +0800, Tang Chen wrote: > > Hi Mel, > > > > On 01/17/2014 01:11 AM, Mel Gorman wrote: > > >On Tue, Dec 03, 2013 at 10:22:00AM +0800, Zhang Yanfei wrote: > > >>From: Yasuaki Ishimatsu<isimatu.yasuaki@xxxxxxxxxxxxxx> > > >> > > >>If system can create movable node which all memory of the node is allocated > > >>as ZONE_MOVABLE, setup_node_data() cannot allocate memory for the node's > > >>pg_data_t. So, invoke memblock_alloc_nid(...MAX_NUMNODES) again to retry when > > >>the first allocation fails. Otherwise, the system could failed to boot. > > >>(We don't use memblock_alloc_try_nid() to retry because in this function, > > >>if the allocation fails, it will panic the system.) > > >> > > > > > >This implies that it is possible to ahve a configuration with a big ratio > > >difference between Normal:Movable memory. In such configurations there > > >would be a risk that the system will reclaim heavily or go OOM because > > >the kernrel cannot allocate memory due to a relatively small Normal > > >zone. What protects against that? Is the user ever warned if the ratio > > >between Normal:Movable very high? > > > > For now, there is no way protecting against this. But on a modern > > server, it won't be > > that easy running out of memory when booting, I think. > > > > > Booting is a basic functional requirement and I'm more concerned about the > behaviour of the kernel when the machine is running. If the kernel trashes > heavily or goes OOM when a workload starts then the fact the machine booted > is not much comfort. > > > The current implementation will set any node the kernel resides in > > as unhotpluggable, > > which means normal zone here. And for nowadays server, especially > > memory hotplug server, > > each node would have at least 16GB memory, which is enough for the > > kernel to boot. > > > > Again, booting is fine but least say it's an 8-node machine then that > implies the Normal:Movable ratio will be 1:8. All page table pages, inode, > dentries etc will have to fit in that 1/8th of memory with all the associated > costs including remote access penalties. In extreme cases it may not be > possible to use all of memory because the management structures cannot be > allocated. Users may want the option of adjusting what this ratio is so > they can unplug some memory while not completely sacrificing performance. > > Minimally, the kernel should print a big fat warning if the ratio is equal > or more than 1:3 Normal:Movable. That ratio selection is arbitrary. I do not > recall ever seeing any major Normal:Highmem bugs on 4G 32-bit machines so it > is a conservative choice. The last Normal:Highmem bug I remember was related > to a 16G 32-bit machine (https://bugzilla.kernel.org/show_bug.cgi?id=42578) > a 1:15 ratio feels very optimistic for a very large machine. > > > We can add a patch to make it return to the original path if we run > > out of memory, > > which means turn off the functionality and warn users in log. > > > > How do you think ? > > > > I think that will allow the machine to boot but that there still will be a > large number of bugs filed with these machines due to high Normal:Movable > ratios. The shape of the bug reports will be similar to the Normal:Highmem > ratio bugs that existed years ago. > > > > The movable_node boot parameter still > > >turns the feature on and off, there appears to be no way of controlling > > >the ratio of memory other than booting with the minimum amount of memory > > >and manually hot-adding the sections to set the appropriate ratio. > > > > For now, yes. We expect firmware and hardware to give the basic > > ratio (how much memory > > is hotpluggable), and the user decides how to arrange the memory > > (decide the size of > > normal zone and movable zone). > > > > There seems to be big gaps in the configuration options here. The user > can either ask it to be automatically assigned and have no control of > the ratio or manually hot-add the memory which is a relatively heavy > administrative burden. > > I think they should be warned if the ratio is high and have an option of > specifying a ratio manually even if that means that additional nodes > will not be hot-removable. > > This is all still a kludge around the fact that node memory hot-remove > did not try and cope with full migration by breaking some of the 1:1 > virt:phys mapping assumptions when hot-remove was enabled. > > -- > Mel Gorman > SUSE Labs > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxx. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>