On Mon, Mar 9, 2020 at 5:28 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
On Mon 09-03-20 11:31:41, Shaju Abraham wrote:
> The VM pressure notification flags have excluded GFP_KERNEL with the
> reasoning that user land will not be able to take any action in case of
> kernel memory being low. This is not true always. Consider the case of
> a user land program managing all the huge memory pages. By including
> GFP_KERNEL flag whenever the kernel memory is low, pressure notification
> can be send, and the manager process can split huge pages to satisfy kernel
> memory requirement.
Are you sure about this reasoning? GFP_KERNEL = __GFP_FS | __GFP_IO | __GFP_RECLAIM
Two of the flags mentioned there are already listed so we are talking
about __GFP_RECLAIM here. Including it here would be a more appropriate
change than GFP_KERNEL btw.
But still I do not really understand what is the actual problem and how
is this patch meant to fix it. vmpressure is triggered only from the
reclaim path which inherently requires to have __GFP_RECLAIM present
so I fail to see how this can make any change at all. How have you
tested it?
We have a user space application which waits on memory pressure events. Upon receiving the
event, the user space program will free up huge pages to make more memory available in the
system.
This mechanism works fine if the memory is being consumed by other user space applications. To
test this, we wrote a test program which will allocate all the memory available in the system using
malloc() and touch the allocated pages. When the free memory level becomes low, the pressure event
is fired and the process gets notified about it .
The same test is repeated with kmalloc() instead of malloc(). A test kernel module is developed, which
will allocate all the available memory with kmalloc(GFP_KERNEL) flag. The OOM killer gets invoked in
this case. The memory pressure event is not fired.
After modifying the vmpressure.c with the attached patch, the pressure event gets triggered.
Swap is disabled in the system we were testing.
Regards
Shaju
> This is a common scanario in cloud. Most of the host memory is reserved
> as hugepages and can be broken down to small pages on demand. This is
> done to minimise fragmentation so that Virtual Machine power on will be
> successful always.
>
> Signed-off-by: Shaju Abraham <shaju.abraham@xxxxxxxxxxx>
> ---
> mm/vmpressure.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> index 4bac22fe1aa2..7ccfb3dd8173 100644
> --- a/mm/vmpressure.c
> +++ b/mm/vmpressure.c
> @@ -253,7 +253,8 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, bool tree,
> * Indirect reclaim (kswapd) sets sc->gfp_mask to GFP_KERNEL, so
> * we account it too.
> */
> - if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
> + if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO |
> + __GFP_FS | GFP_KERNEL)))
> return;
>
> /*
> --
> 2.20.1
>
--
Michal Hocko
SUSE Labs