On Tue, Dec 18, 2018 at 01:14:51PM +0100, Michal Hocko wrote: >On Mon 17-12-18 14:18:02, Wei Yang wrote: >> On Mon, Dec 17, 2018 at 11:25:34AM +0100, Michal Hocko wrote: >> >On Sun 16-12-18 20:56:24, Wei Yang wrote: >> >> A non-zero zone_movable_pfn indicates this node has ZONE_MOVABLE, while >> >> current implementation doesn't comply with this rule when kernel >> >> parameter "kernelcore=" is used. >> >> >> >> Current implementation doesn't harm the system, since the value in >> >> zone_movable_pfn is out of the range of current zone. While user would >> >> see this message during bootup, even that node doesn't has ZONE_MOVABLE. >> >> >> >> Movable zone start for each node >> >> Node 0: 0x0000000080000000 >> > >> >I am sorry but the above description confuses me more than it helps. >> >Could you start over again and describe the user visible problem, then >> >follow up with the udnerlying bug and finally continue with a proposed >> >fix? >> >> Yep, how about this one: >> >> For example, a machine with 8G RAM, 2 nodes with 4G on each, if we pass > >Did you mean 2G on each? Because your nodes do have 2GB each. > >> "kernelcore=2G" as kernel parameter, the dmesg looks like: >> >> Movable zone start for each node >> Node 0: 0x0000000080000000 >> Node 1: 0x0000000100000000 >> >> This looks like both Node 0 and 1 has ZONE_MOVABLE, while the following >> dmesg shows only Node 1 has ZONE_MOVABLE. > >Well, the documentation says > kernelcore= [KNL,X86,IA-64,PPC] > Format: nn[KMGTPE] | nn% | "mirror" > This parameter specifies the amount of memory usable by > the kernel for non-movable allocations. The requested > amount is spread evenly throughout all nodes in the > system as ZONE_NORMAL. The remaining memory is used for > movable memory in its own zone, ZONE_MOVABLE. In the > event, a node is too small to have both ZONE_NORMAL and > ZONE_MOVABLE, kernelcore memory will take priority and > other nodes will have a larger ZONE_MOVABLE. Yes, current behavior is a little bit different. When you look at find_usable_zone_for_movable(), the ZONE_MOVABLE is in the highest ZONE. Which means if a node doesn't has the highest zone, all its memory belongs to kernelcore. Looks like a design decision? > >> On node 0 totalpages: 524190 >> DMA zone: 64 pages used for memmap >> DMA zone: 21 pages reserved >> DMA zone: 3998 pages, LIFO batch:0 >> DMA32 zone: 8128 pages used for memmap >> DMA32 zone: 520192 pages, LIFO batch:63 >> >> On node 1 totalpages: 524255 >> DMA32 zone: 4096 pages used for memmap >> DMA32 zone: 262111 pages, LIFO batch:63 >> Movable zone: 4096 pages used for memmap >> Movable zone: 262144 pages, LIFO batch:63 > >so assuming your really have 4GB in total and 2GB should be in kernel >zones then each node should get half of it to kernel zones and the >remaining 2G evenly distributed to movable zones. So something seems >broken here. In case we really have this implemented. We will have following memory layout. +---------+------+---------+--------+------------+ |DMA |DMA32 |Movable |DMA32 |Movable | +---------+------+---------+--------+------------+ |< Node 0 >|< Node 1 >| This means we have none-monotonic increasing zone. Is this what we expect now? If this is, we really have someting broken. >-- >Michal Hocko >SUSE Labs -- Wei Yang Help you, Help me