liuq <liuq131@xxxxxxxxxxxxxxx> writes: > 在 2023/6/19 11:25, Huang, Ying 写道: >> Added Mel and Vlastimil. >> >> liuq <liuq131@xxxxxxxxxxxxxxx> writes: >> >>> The current calculation of min_free_kbytes only uses ZONE_DMA and >>> ZONE_NORMAL pages,but the ZONE_MOVABLE zone->_watermark[WMARK_MIN] >>> will also divide part of min_free_kbytes.This will cause the min >>> watermark of ZONE_NORMAL to be too small in the presence of ZONE_MOVEABLE. >> This seems like a real problem per my understanding. Can you show the >> contents of /proc/zoneinfo on a problem system? >> >> But, per my understanding, min_free_kbytes are used for __GFP_HIGH and >> PF_MEMALLOC allocations. While ZONE_MOVABLE will not be used for them >> usually. So I think we should treat ZONE_MOVABLE as ZONE_HIGHMEM in >> __setup_per_zone_wmarks(). >> >> Best Regards, >> Huang, Ying > On my testing machine with 16GB of memory (transparent hugepage is > turned off by default), when movable zone is not configured, > min_free_kbytes is 15806 (15806*15806/16=15614352kbytes, approximatel > y 16G). > The detailed info is as follows: > [root@lq-workstation ~]# cat /proc/cmdline > BOOT_IMAGE=/vmlinuz-6.2.0-rc7-00018-g0983f6bf2bfc-dirty > root=/dev/mapper/ctyunos00-root ro resume=/dev/mapper/ctyunos00-swap > rd.lvm.lv=ctyunos00/root rd.lvm.lv=ctyunos00/swap crashkernel=512M > [root@lq-workstation ~]# cat /proc/zoneinfo |grep -A 5 min > min 3 > low 6 > high 9 > spanned 4095 > present 3998 > managed 3840 > -- > min 328 > low 652 > high 976 > spanned 1044480 > present 478802 > managed 330969 > -- > min 3618 > low 7193 > high 10768 > spanned 3655680 > present 3655680 > managed 3575787 > -- > min 0 > low 0 > high 0 > spanned 0 > present 0 > managed 0 > [root@lq-workstation ~]# cat /proc/sys/vm/min_free_kbytes > 15806 > > If movablecore=12G is configured, at this time, min_free_kbytes is > 7326 (7326 * 7326/16=3354392, approximately 16G-12G) > The detailed info is as follows: > [root@lq-workstation ~]# cat /proc/cmdline > BOOT_IMAGE=/vmlinuz-6.2.0-rc7-00018-g0983f6bf2bfc-dirty > root=/dev/mapper/ctyunos00-root ro resume=/dev/mapper/ctyunos00-swap > rd.lvm.lv=ctyunos00/root rd.lvm.lv=ctyunos00/swap crashkernel=512M > movablecore=12G > [root@lq-workstation ~]# cat /proc/zoneinfo |grep -A 5 min > min 1 > low 4 > high 7 > spanned 4095 > present 3998 > managed 3840 > -- > min 152 > low 476 > high 800 > spanned 1044480 > present 478802 > managed 330969 > -- > min 239 > low 748 > high 1257 > spanned 509952 > present 509952 > managed 509952 > -- > min 1437 > low 4502 > high 7567 > spanned 3145728 > present 3145728 > managed 3065833 > [root@lq-workstation ~]# cat /proc/sys/vm/min_free_kbytes > 7326 Thank you very much for data! Per my understanding, this verifies a real problem. You patch can fix the too small "min" for ZONE_NORMAL/ZONE_DMA. But, IMHO, it increases "min" for ZONE_MOVABLE unnecessarily. Because we don't allocate from ZONE_MOVABLE for __GFP_HIGH or PF_MEMALLOC allocations. So, IMHO, we should treat ZONE_MOVABLE as ZONE_HIGHMEM in __setup_per_zone_wmarks(). Best Regards, Huang, Ying > After this patch is added, the configuration of the movable zone no > longer affects the size of the min_free_kbytes, which is only affected > by the size of the available memory. >>> Signed-off-by: liuq <liuq131@xxxxxxxxxxxxxxx> >>> --- >>> include/linux/mm.h | 1 + >>> mm/khugepaged.c | 2 +- >>> mm/page_alloc.c | 15 ++++++++++++++- >>> 3 files changed, 16 insertions(+), 2 deletions(-) >>> >>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>> index cf3d0d673f6b..1f91d035bcaf 100644 >>> --- a/include/linux/mm.h >>> +++ b/include/linux/mm.h >>> @@ -863,6 +863,7 @@ void split_page(struct page *page, unsigned int order); >>> void folio_copy(struct folio *dst, struct folio *src); >>> unsigned long nr_free_buffer_pages(void); >>> +unsigned long nr_free_pagecache_pages(void); >>> /* >>> * Compound pages have a destructor function. Provide a >>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >>> index 16be62d493cd..6632264b951c 100644 >>> --- a/mm/khugepaged.c >>> +++ b/mm/khugepaged.c >>> @@ -2342,7 +2342,7 @@ static void set_recommended_min_free_kbytes(void) >>> /* don't ever allow to reserve more than 5% of the lowmem */ >>> recommended_min = min(recommended_min, >>> - (unsigned long) nr_free_buffer_pages() / 20); >>> + (unsigned long) nr_free_pagecache_pages() / 20); >>> recommended_min <<= (PAGE_SHIFT-10); >>> if (recommended_min > min_free_kbytes) { >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>> index e008a3df0485..489b564526dd 100644 >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -5775,6 +5775,19 @@ unsigned long nr_free_buffer_pages(void) >>> } >>> EXPORT_SYMBOL_GPL(nr_free_buffer_pages); >>> +/** >>> + * nr_free_pagecache_pages - count number of pages beyond high watermark >>> + * >>> + * nr_free_pagecache_pages() counts the number of pages which are beyond the >>> + * high watermark within all zones. >>> + * >>> + * Return: number of pages beyond high watermark within all zones. >>> + */ >>> +unsigned long nr_free_pagecache_pages(void) >>> +{ >>> + return nr_free_zone_pages(gfp_zone(GFP_HIGHUSER_MOVABLE)); >>> +} >>> + >>> static inline void show_node(struct zone *zone) >>> { >>> if (IS_ENABLED(CONFIG_NUMA)) >>> @@ -8651,7 +8664,7 @@ void calculate_min_free_kbytes(void) >>> unsigned long lowmem_kbytes; >>> int new_min_free_kbytes; >>> - lowmem_kbytes = nr_free_buffer_pages() * (PAGE_SIZE >> 10); >>> + lowmem_kbytes = nr_free_pagecache_pages() * (PAGE_SIZE >> 10); >>> new_min_free_kbytes = int_sqrt(lowmem_kbytes * 16); >>> if (new_min_free_kbytes > user_min_free_kbytes)