Re: Memory management broken by "mm: reclaim small amounts of memory when an external fragmentation event occurs"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Apr 06, 2019 at 11:20:35AM -0400, Mikulas Patocka wrote:
> Hi
> 
> The patch 1c30844d2dfe272d58c8fc000960b835d13aa2ac ("mm: reclaim small 
> amounts of memory when an external fragmentation event occurs") breaks 
> memory management on parisc.
> 
> I have a parisc machine with 7GiB RAM, the chipset maps the physical 
> memory to three zones:
> 	0) Start 0x0000000000000000 End 0x000000003fffffff Size   1024 MB
> 	1) Start 0x0000000100000000 End 0x00000001bfdfffff Size   3070 MB
> 	2) Start 0x0000004040000000 End 0x00000040ffffffff Size   3072 MB
> (but it is not NUMA)
> 
> With the patch 1c30844d2, the kernel will incorrectly reclaim the first 
> zone when it fills up, ignoring the fact that there are two completely 
> free zones. Basiscally, it limits cache size to 1GiB.
> 
> For example, if I run:
> # dd if=/dev/sda of=/dev/null bs=1M count=2048
> 
> - with the proper kernel, there should be "Buffers - 2GiB" when this 
> command finishes. With the patch 1c30844d2, buffers will consume just 1GiB 
> or slightly more, because the kernel was incorrectly reclaiming them.
> 

I could argue that the feature is behaving as expected for separate
pgdats but that's neither here nor there. The bug is real but I have a
few questions.

First, if pa-risc is !NUMA then why are separate local ranges
represented as separate nodes? Is it because of DISCONTIGMEM or something
else? DISCONTIGMEM is before my time so I'm not familiar with it and
I consider it "essentially dead" but the arch init code seems to setup
pgdats for each physical contiguous range so it's a possibility. The most
likely explanation is pa-risc does not have hardware with addressing
limitations smaller than the CPUs physical address limits and it's
possible to have more ranges than available zones but clarification would
be nice.  By rights, SPARSEMEM would be supported on pa-risc but that
would be a time-consuming and somewhat futile exercise.  Regardless of the
explanation, as pa-risc does not appear to support transparent hugepages,
an option is to special case watermark_boost_factor to be 0 on DISCONTIGMEM
as that commit was primarily about THP with secondary concerns around
SLUB. This is probably the most straight-forward solution but it'd need
a comment obviously. I do not know what the distro configurations for
pa-risc set as I'm not a user of gentoo or debian.

Second, if you set the sysctl vm.watermark_boost_factor=0, does the
problem go away? If so, an option would be to set this sysctl to 0 by
default on distros that support pa-risc. Would that be suitable?

Finally, I'm sure this has been asked before buy why is pa-risc alive?
It appears a new CPU has not been manufactured since 2005. Even Alpha
I can understand being semi-alive since it's an interesting case for
weakly-ordered memory models. pa-risc appears to be supported and active
for debian at least so someone cares. It's not the only feature like this
that is bizarrely alive but it is curious -- 32 bit NUMA support on x86,
I'm looking at you, your machines are all dead since the early 2000's
AFAIK and anyone else using NUMA on 32-bit x86 needs their head examined.

-- 
Mel Gorman
SUSE Labs



[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux