Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384
https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576

It explains the problem. The madvise() does have some extra overhead
for handling THP (splitting pmd, deferred split queue, etc).


Another thing is whether it's worthy to make stack use THP? It may be
useful for some apps which need large stack size?

Kernel actually doesn't apply THP to stack (see
vma_is_temporary_stack()). But kernel can't know whether the VMA is
stack or not by checking VM_GROWSDOWN | VM_GROWSUP flags. So if glibc
doesn't set the proper flags to tell kernel the area is stack, kernel
just treats it as normal anonymous area. So glibc should set up stack
properly IMHO.

If I read the code correctly, nptl allocates stack by the below code:

mem = __mmap (NULL, size, (guardsize == 0) ? prot : PROT_NONE,
                         MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);

See https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L563

The MAP_STACK is used, but it is a no-op on Linux. So the alternative
is to make MAP_STACK useful on Linux instead of changing glibc. But
the blast radius seems much wider.
Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to
filter out of the MAP_STACK mapping based on this patch. The regression
in stress-ng.pthread was gone. I suppose this is kind of safe because
the madvise call is only applied to glibc allocated stack.

The patch I tested against stress-ng.pthread:

diff --git a/mm/mmap.c b/mm/mmap.c
index b78e83d351d2..1fd510aef82e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1829,7 +1829,8 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
                 */
                pgoff = 0;
                get_area = shmem_get_unmapped_area;
-       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
+                               !(flags & MAP_STACK)) {
/* Ensures that larger anonymous mappings are THP aligned. */
                get_area = thp_get_unmapped_area;
        }




But what I am not sure was whether it's worthy to do such kind of change
as the regression only is seen obviously in micro-benchmark. No evidence
showed the other regressionsin this report is related with madvise. At
least from the perf statstics. Need to check more on stream/ramspeed. Thanks.


Regards
Yin, Fengwei





Regards
Yin, Fengwei




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux