> >On Wed, May 31, 2023 at 10:51:01AM +0800, zhaoyang.huang wrote: >> From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx> >> >> This patch fixes unproductive reclaiming of CMA pages by skipping them >> when they are not available for current context. It is arise from >> bellowing OOM issue, which caused by large proportion of MIGRATE_CMA >pages among free pages. > >Hello, > >I've been looking into a problem with high memory pressure causing OOMs in >some of our workloads, and it seems that this change may have introduced lock >contention when there is high memory pressure. > >I've collected some metrics for my specific workload that suggest this change >has increased the lruvec->lru_lock waittime-max by 500x and the >waittime-avg by 20x. > >Experiment >========== > >The experiment involved 100 hosts, each with 64GB of memory and a single >Xeon 8321HC CPU. The experiment ran for over 80 hours. > >Half of the hosts (50) were configured with the patch reverted and lock stat >enabled, while the other half was run against the upstream version. >All machines had hugetlb_cma=6G set as a command-line argument. > >In this context, "upstream" refers to kernel release 6.9 with some minor >changes that should not impact the results. > >Workload >======== > >The workload is a Java based application that fully utilized the memory, in fact, >the JVM runs with `-Xms50735m -Xmx50735m` arguments. > >Results: >======= > >A few values from lockstat: > > waittime-max waittime-total waittime-avg >holdtime-max >6.9: 242889 15618873933 715 >17485 >6.9-with-revert: 487 688563299 34 >464 > >The full data could be seen at: >https://docs.google.com/spreadsheets/d/1Dl-8ImlE4OZrfKjbyWAIWWuQtgD3f >wEEl9INaZQZ4e8/edit?usp=sharing > >Possible causes: >================ > >I've been discussing this with colleagues and we're speculating that the high >contention might be linked to the fact that CMA regions are now being skipped. >This could potentially extend the duration of the >isolate_lru_folios() 'while' loop, resulting in increased pressure on the lock. > >However, I want to emphasize that I'm not an expert in this area and I am >simply sharing the data I collected. Could you please try below patch which could be helpful https://lore.kernel.org/linux-mm/CAOUHufa7OBtNHKMhfu8wOOE4f0w3b0_2KzzV7-hrc9rVL8e=iw@xxxxxxxxxxxxxx/