On Thu, Dec 21, 2023 at 10:07:09AM -0800, Yang Shi wrote: > On Wed, Dec 20, 2023 at 8:49 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > > On Thu, Dec 21, 2023 at 08:58:42AM +0800, Yin Fengwei wrote: > > > Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to > > > filter out of the MAP_STACK mapping based on this patch. The regression > > > in stress-ng.pthread was gone. I suppose this is kind of safe because > > > the madvise call is only applied to glibc allocated stack. > > > > > > > > > But what I am not sure was whether it's worthy to do such kind of change > > > as the regression only is seen obviously in micro-benchmark. No evidence > > > showed the other regressionsin this report is related with madvise. At > > > least from the perf statstics. Need to check more on stream/ramspeed. > > > > FWIW, we had a customer report a significant performance problem when > > inadvertently using 2MB pages for stacks. They were able to avoid it by > > using 2044KiB sized stacks ... > > Thanks for the report. This provided more justification regarding > honoring MAP_STACK on Linux. Some applications, for example, pthread, > just allocate a fixed size area for stack. This confuses kernel > because kernel tell stack by VM_GROWSDOWN | VM_GROWSUP. > > But I'm still a little confused by why THP for stack could result in > significant performance problems. Unless the applications resize the > stack quite often. We didn't delve into what was causing the problem, only that it was happening. The application had many threads, so it could have been as simple as consuming all the available THP and leaving fewer available for other uses. Or it could have been a memory consumption problem; maybe the app would only have been using 16-32kB per thread but was now using 2MB per thread and if there were, say, 100 threads, that's an extra 199MB of memory in use.